Subtracting mean of every two columns from two dataframes R
Suppose I have two data frames as follows:
df1 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10))) colnames(df1) <- c("V1","V2","V3") df2 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10))) colnames(df2) <- c("V1","V2","V3")
Using this dummy data, I want to create a new dataframe with 1 column and 3 rows:
V1 1 mean(df1$V1) - mean(df2$V1) 2 mean(df1$V2) - mean(df2$V2) 3 mean(df1$V3) - mean(df2$V3)
I also want to create another dataframe as follows:
V1 1 wilcox.test(df1$V1,df2$V1)$p.value 2 wilcox.test(df1$V2,df2$V2)$p.value 3 wilcox.test(df1$V3,df2$V3)$p.value
My real data has 54 columns, so for my data each dataframe would be of 54 rows.
data.frame(mean = colMeans(df1) - colMeans(df2)) # mean # V1 1.4 # V2 2.0 # V3 1.4
data.frame( p.value = mapply(function(x, y) wilcox.test(x, y)$p.value, df1, df2) ) # p.value # V1 0.32060365 # V2 0.07784363 # V3 0.21779915
Match columns in two different data frames and subtract , Match columns in two different data frames and subtract corresponding values I am a newbie to R but I am slowly discovering its' potential and its' handiness and I The second data set is composed of the average y-values for every y-value Now that you’ve reviewed the rules for creating subsets, you can try it with some data frames in R. You just have to remember that a data frame is a two-dimensional object and contains rows as well as columns. This means that you need to specify the subset for rows and columns independently. To do …
out <- NULL for(i in 1:ncol(df1)) out[[i]] <- wilcox.test(df1[,i], df2[,i])$p.value data.frame(p=unlist(out))
Understanding and Applying Basic Statistical Methods Using R, For the data in Exercise 9, use R to subtract the average from each value, and then data set called chickwts, which is stored in a data frame with two columns. Hi I am confused how to do this. I have two tables and each table has an "Amount" colum. I need to sum up all the rows for column "Amount" in table A and sum up all the rows for column "Amount" in
You can do it using a vector of ones:
m1 = (t(df1) %*% rep(1, nrow(df1))) / nrow(df1) # Equivalent to a mean m2 = (t(df2) %*% rep(1, nrow(df2))) / nrow(df2) m1-m2
how to subtract a column to the other colums in a data frame, R does math in a matrix element-wise. That means if you have subtract two matrices, it subtracts the values in each spot in the two matrices from each other. DataFrame.subtract (self, other, axis = 'columns', level = None, fill_value = None) [source] ¶ Get Subtraction of dataframe and other, element-wise (binary operator sub ). Equivalent to dataframe - other , but with support to substitute a fill_value for missing data in one of the inputs.
tidyverse approach to create a table with info about the tests you've performed:
# for reproducibility set.seed(215) # example datasets df1 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10))) colnames(df1) <- c("V1","V2","V3") df2 <- data.frame(ceiling(runif(10,1,10)), ceiling(runif(10,1,10)), ceiling(runif(10,1,10))) colnames(df2) <- c("V1","V2","V3") library(tidyverse) list(df1, df2) %>% # put your dataframes in a list map_df(data.frame, .id = "df") %>% # create a dataframe with an id value for each dataset tbl_df() %>% # for visualisation purposes only gather(v, x, -df) %>% # reshape data nest(-v) %>% # nest data mutate(w.t = map(data, ~wilcox.test(.x$x ~ .x$df)), # perfom wilcoxon test pval = map_dbl(w.t, "p.value"), # extract p value mean_diff = map_dbl(data, ~mean(.x$x[.x$df==1])-mean(.x$x[.x$df==2]))) # calculate mean difference # # A tibble: 3 x 5 # v data w.t pval mean_diff # <chr> <list> <list> <dbl> <dbl> # 1 V1 <tibble [20 x 2]> <S3: htest> 0.730 0.600 # 2 V2 <tibble [20 x 2]> <S3: htest> 0.145 -1.8 # 3 V3 <tibble [20 x 2]> <S3: htest> 0.0295 2.8
v represents your variables (initial columns).
data includes the variables used for the corresponding test.
w.t includes the test output.
pval is the extracted p value from each test.
mean_diff is the mean difference.
If you save the above process as
results you'll be able to use
results$w.t and see the test outputs
Manipulating, analyzing and exporting data with tidyverse, Select certain columns in a data frame with the dplyr function select . Packages in R are basically sets of additional functions that let you do more stuff. for example to do unit conversions, or to find the ratio of values in two columns. Use group_by() and summarize() to find the mean, min, and max hindfoot length for Dear R helpers I have a dataframe as df = data.frame(x = c(1, 14, 3, 21, 11), y = c(102, 500, 40, 101, 189)) > df x y 1 1 102 2 14 500 3 3 40 4 21 101 5 11 189 # Actually I am having dataframe having multiple columns. I am just giving an example.
Introduction to R Exercise Answers, I'll use this notebook to provide worked answers to all of the exercises we did in up and we will see the subtraction results from the same positions, ie 2-5, 5-6, Join the two vectors into a data frame called mouse.info containing 2 columns Use this logical vector to define which rows to keep in a standard 2 dimension The main difference between the two is that when you input a data frame into sapply(), it’s treated as a list of columns. Whereas apply() can basically treat the data frame as a list of columns (MARGIN=2) or rows (MARGIN=1). Below is an example of apply() to find the max value in each column and [soucecode lang=”splus”] ~~~~~
5 Data Wrangling via dplyr, In other words, the observational unit of the flights tidy data frame is a flight. the rows based on one or more variables; join() : Join/merge two data frames by and mean of the temperature variable temp in the weather data frame of and is how R encodes missing values; if in a data frame for a particular row and column Pandas dataframe.subtract () function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe - other but with a support to substitute for missing data in one of the inputs. Syntax: DataFrame.subtract (other, axis=’columns’, level=None, fill_value=None)
Exploratory Data Analysis with R, Here we create a pm25detrend variable that subtracts the mean from the pm25 -5.393846 Note that there are only two columns in the transmuted data frame. column subtraction by row. sum or mean. Plus, all the apply/lapply examples I'm looking at seem to depend on a data frame being just the two columns on which to