How to randomly select row from a dataframe for which the row skewness is larger that a given value in R

r extract rows with certain value
r subset dataframe by column value
r subset dataframe by list of values
r select rows by condition
r remove rows with certain values
select rows based on column value r
r extract rows with certain string
r select rows from data frame

I am trying to select random rows from a data frame with 1000 lines (and six columns) where the skewness of the line is larger than a given value (say Sk > 0.3).

I've generated the following data frame

df=data.frame(replicate(6,sample(10:100,1000,rep=TRUE)))

I can get row skewness from the fbasics package:

rowSkewness(df) gives:

   [8] -0.2243295435  0.5306809351  0.0707122386  0.0341447417  0.3339384838 -0.3910593364 -0.6443905090
  [15]  0.5603809206  0.4406091534 -0.3736108832  0.0397860038  0.9970040772 -0.7702547535  0.2065830354 

But now, I need to select say 10 rows of the df which have rowskewness greater than say 0.1... May with

for (a in 1:10) {
  sample.data[a,] = sample(x=df[wich(rowSkewness(df[sample(1:nrow(df),1)>0.1),], size = 1, replace = TRUE)
}

or something like this?

Any thoughts on this will be appreciated. thanks in advance.

you can use the sample_n() function or sample_frac() - makes your version a little shorter:

library(tidyr)
library(fBasics)
df=data.frame(replicate(6,sample(10:100,1000,rep=TRUE)))
x=df %>% dplyr::filter(rowSkewness(df)>0.1)  %>% dplyr::sample_n(10)

Could anyone tell me how to calculate skewness and kurtosis of a , Install it in R, load it and use the functions kurtosis() and skewness() as follows: > I have a data frame (RNASeq), I want to filter a column (>=1.5 & <=-2, log2 values), should be able to delete all the rows with respective the column values which of the random effects is too large and how do you determine the implications of  Since life exists in more than one dimension, you can easily adapt R’s random sampling process to support this. # r sample dataframe; selecting a random subset in r # df is a data frame; pick 5 rows df[sample(nrow(df), 5), ] In this example, we are using the sample function in r to select a random subset of 5 rows from a larger data frame.

Got it:

x=df %>% filter(rowSkewness(df)>0.1)
for (a in 1:samplesize) {
  sample.data[a,] = sample(x=x, size = 1, replace = TRUE)
}

Subset Data Frame Rows in R, We will also show you how to remove rows with missing values in a given column . Remove missing values; Select random rows from a data frame; Select top n rows ordered by a Select rows when any of the variables are greater than 2.4:. Select random rows from a data frame. It’s possible to select either n random rows with the function sample_n() or a random fraction of rows with sample_frac(). We first use the function set.seed() to initiate random number generator engine. This important for users to reproduce the analysis.

Just do a subset:

res1 <- DF[fBasics::rowSkewness(DF) > .1, ]

head(res1)
#    X1 X2 X3 X4 X5 X6
# 7  56 28 21 93 74 24
# 8  33 56 23 44 10 12
# 12 29 19 29 38 94 95
# 13 35 51 54 98 66 10
# 14 12 51 24 23 36 68
# 15 50 37 81 22 55 97

Or with e1071::skewness:

res2 <- DF[apply(as.matrix(DF), 1, e1071::skewness) > .1, ]

stopifnot(all.equal(res1, res2))
Data
set.seed(42); DF <- data.frame(replicate(6, sample(10:100, 1000, rep=TRUE)))

YaRrr! The Pirate's Guide to R, 15.5.1 Adding a regression line to a plot · 15.5.2 Transforming skewed The process of selecting specific rows and columns of data based on some Of course, you can index matrices and dataframes with longer vectors to get Next, we index a dataframe (typically the rows) using the logical vector to return only values for  Let’s see how to Select rows based on some conditions in Pandas DataFrame. Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator.. Code #1 : Selecting all the rows from the given dataframe in which ‘Percentage’ is greater than 80 using basic method.

[PDF] simpleR – Using R for Introductory Statistics, An introductory book to R written by, and for, R pirates. In the next section we'll go over the standard sample() function for drawing random values from a vector. If you try to draw a large sample from a vector replacement, R will return an error but we'll make the probability of selecting “a” to be .90, and the probability of  Now let’s select rows from this DataFrame based on conditions, Select Rows based on value in column. Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, subsetDataFrame = dfObj[dfObj['Product'] == 'Apples'] It will return a DataFrame in which Column ‘Product‘ contains ‘Apples‘ only i.e.

Essential R, computer users, the next leap to programming will not be so large. To start up R's command line you can do the following: in Windows find In particular, we have the following values currently in typos.draft2 skewed. −1.0. 0.0. 1.0. 2.0 long−tailed. −4. 0. 2. 4. 6. 8. Figure 7: Random distributions with both a histogram and  This article represents a command set in the R programming language, which can be used to extract rows and columns from a given data frame.When working on data analytics or data science projects

API, Let's create a variable (also called an object in R), “a” and give it a value of 5. a = 5 # = It is often the case when working with data that we want to select only specific parts of the data (think how can I turn it into a dataframe with 8 rows, and three Both variables show a bit of skew, with a larger number of low values . As you can see, we have inserted a row into the R dataframe immediately following the existing rows. We now have a weight value of 210 inserted for an imaginary 22nd measurement day for the first chick, who was fed diet one. How to Add Rows To A Dataframe (Multiple) If we needed to insert multiple rows into a r data frame, we have several options.

Comments
  • Thank you. What error do you get? Maybe try to load library(tidyverse) instead of library(tidyr) ?