How to randomly select elements from subset of dataframe?

pandas random split
pandas.dataframe.sample example
pandas sample by group
how to randomly split a dataframe in pandas
select randomly from dataframe
pandas balanced sampling
pandas select columns
select random data from dataframe

I have dataframe in the following form:

W1 W2 W3 W4 0 1 1 0 1 1 1 1 1 0 0 0 0 1 0 1

For every row, I want to randomly select single element that is 1 and make other ones zero. Initial zeros stay zeros E.g.

W1 W2 W3 W4 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1

I have very convoluted solution that uses iterrows(), but I'm looking for a pandastic one.

IIUC, you want to randomly select 1 from every row and make the rest 0. Here's one approach. Sample the indices and based on indices assign 1. i.e

idx = pd.DataFrame(np.stack(np.where(df==1))).T.groupby(0).apply(lambda x: x.sample(1)).values
# array([[0, 2],
#        [1, 1],
#        [2, 0],
#        [3, 3]])

ndf = pd.DataFrame(np.zeros(df.shape),columns=df.columns)

ndf.values[idx[:,0],idx[:,1]] = 1

   W1  W2  W3  W4
0   0   0   1   0
1   1   0   0   0
2   1   0   0   0
3   0   1   0   0

How to randomly select rows from Pandas DataFrame , Sample method returns a random sample of items from an axis of object and this object of same type as your caller. Example 1: filter_none. edit close. play_arrow. R Sample Dataframe: Randomly Select Rows In R Dataframes Up till now, our examples have dealt with using the sample function in R to select a random subset of the values in a vector. It is more likely you will be called upon to generate a random sample in R from an existing data frames, randomly selecting rows from the larger set of observations.

Idea is extract positions, shuffling and then remove duplicates by first column 0 - by rows:

#get positions of 1
a = np.where(df == 1)

#create nd array
X = np.hstack((a[0][:, None], a[1][:, None]))
#shuffling
np.random.shuffle(X)

#remove duplicates
vals = pd.DataFrame(X).drop_duplicates(0).values

#set 1
arr = np.zeros(df.shape)
arr[vals[:,0],vals[:,1]] = 1

df = pd.DataFrame(arr.astype(int), columns=df.columns, index=df.index)
print (df)
   W1  W2  W3  W4
0   0   0   1   0
1   0   0   0   1
2   1   0   0   0
3   0   1   0   0

How To Randomly Select Rows in Pandas?, How to get a random subset of data. To randomly select rows from a pandas dataframe, we can use sample function from Pandas. For example  With pandas version 0.16.1 and up, there is now a DataFrame.sample method built-in: import pandas df = pandas.DataFrame(pandas.np.random.random(100)) # Randomly sample 70% of your dataframe df_percent = df.sample(frac=0.7) # Randomly sample 7 elements from your dataframe df_elements = df.sample(n=7)

Here is mixture of functional and pandastic approach:

df = pd.DataFrame({'w1': [0, 1,1,0],
                   'w2': [1, 1,0,1],
                   'w3': [1, 1,0,0],
                   'w4': [0, 1,0,1]})
df
   w1  w2  w3  w4
0   0   1   1   0
1   1   1   1   1
2   1   0   0   0
3   0   1   0   1


def choose_one(row):
    """
    returns array with randomly chosen positive value and 0 otherwise
    """
    one = np.random.choice([i for i, v in enumerate(row) if v])
    return [0 if i != one else 1 for i in range(len(row))]

apply for each row

df.apply(choose_one, 1)

   w1  w2  w3  w4
0   0   1   0   0
1   0   1   0   0
2   1   0   0   0
3   0   0   0   1

How to use Pandas Sample to Select Rows and Columns, Here we will learn how to; select rows at random, set a random seed, is run sample on each subset (i.e., for each Player) and take 2 random  How to get a random subset of data To randomly select rows from a pandas dataframe, we can use sample function from Pandas. For example, to randomly select n=3 rows, we use sample with the argument n. Every time, we run “sample” we will get randomly selected 3 rows from the Pandas dataframe.

pandas.DataFrame.sample, Returns a random sample of items from an axis of object. New in version 0.16.1. Parameters: n : int, optional. In this case, a subset of both rows and columns is made in one go and just using selection brackets [] is not sufficient anymore. The loc / iloc operators are required in front of the selection brackets []. When using loc / iloc, the part before the comma is the rows you want, and the part after the comma is the columns you want to select.

pandas.DataFrame.sample, A new object of same type as caller containing n items randomly sampled from the caller object. See also. numpy.random.choice. Generates a random sample  We can select rows from the data frame by applying a condition to the overall data frame. Any row meeting that condition is returned, in this case, the observations from birds fed the test diet. You can, in fact, use this syntax for selections with multiple conditions.

Subsetting Data, This includes keeping or deleting variables, observations, random samples. To practice this interactively, try the selection of data frame elements exercises in  We will begin our journey of selecting subsets by using just the indexing operator on a DataFrame. Its main purpose is to select a single column or multiple columns of data. Selecting a single

Comments
  • Thanks! How about solution that doesn't use numpy?
  • @QuantChristo I made it as pandastic as possible. Good luck
  • I wonder if it is possible or simpler with following trick: I draw random 1 in every row and change it to 2. Subsequently I change 1 to 0 and 2 back to 1.
  • I dont think this is what op wanted
  • @Dark - Why do you think?
  • If I understand correctly he wants to keep only 1 one selected randomly in every row.
  • @Dark - Not sure if understand, be free post answer.
  • I have posted my answer based on my understanding of the question.