In python,how to get the rows from a data frame where a particular string is present in any of the column (String value)

Related searches

My data frame contains name, age, Task1, Task2, Task3. Now I need to get all the rows that satisfy a string value in either of Task1, Task2, Task3 columns. Say I want to check 'Drafting', key word. If 'Drafting' is present as part of any of these column value, then, that entire row has to be added to resultant frame.

I tried isin() but I am getting true or false. I need to extract such 'N' rows, that contain a particular keyword. I tried, df.columns[df.Task1.str.contains("Drafting")] , but this compare and give single column . Any one know how to use, str.contains or any other method to compare string values of columns and get all rows that satisfy the checking condition.

  Name  Age              Task1    Task2            Task3
0  Ann   43  Drafting a Letter  sending           paking
1  Juh   29            sending   paking  Letter Drafting
2  Jeo   42            Pasting  sending           paking
3  Sam   59            sending  pasting  Letter Drafting

I need to check if the key word 'Drafting' is present in any of the columns[The column contains 3 to 4 words, need to check Drafting is present in this words/sentence]; the result should be:

  Name  Age              Task1    Task2            Task3
0  Ann   43  Drafting a Letter  sending           paking
1  Juh   29            sending   paking  Letter Drafting
3  Sam   59            sending  pasting  Letter Drafting

Or just(Note this will check entire df not specific columns):

df[df.astype(str).apply(lambda x: x.str.contains('Drafting')).any(axis=1)]
#for case insensitive use below
#df[df.astype(str).apply(lambda x: x.str.contains('Drafting',case=False)).any(axis=1)]

  Name  Age              Task1    Task2            Task3
0  Ann   43  Drafting a Letter  sending           paking
1  Juh   29            sending   paking  Letter Drafting
3  Sam   59            sending  pasting  Letter Drafting

Pandas Select rows by condition and String Operations, Select all rows containing a sub string; Select rows by list of values the column values; Search for a String in Dataframe and replace with The name column doesn't have any numbers now datascience pandas python. At this point you know how to load CSV data in Python. In this lesson, you will learn how to access rows, columns, cells, and subsets of rows and columns from a pandas dataframe. Let’s open the CSV file again, but this time we will work smarter. We will not download the CSV from the web manually.

A quick comparison of given answers on 20k rows of data-

@Alollz (in comments)

%timeit df.loc[df.filter(like='Task').applymap(lambda x: 'Drafting' in x).any(1)]
25.2 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@Sergey Bushmanov

%timeit df[df.Task1.str.contains("Drafting") | df.Task2.str.contains("Drafting") | df.Task3.str.contains("Drafting")]
58.7 ms ± 9.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@anky_91

%timeit df[df.filter(like='Task').apply(lambda x: x.str.contains('Drafting')).any(axis=1)]
88.6 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df[df.astype(str).apply(lambda x: x.str.contains('Drafting')).any(axis=1)]
128 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ALollz

%timeit  df.loc[df.filter(like='Task').stack().str.split(expand=True).eq('Drafting').any(1).any(level=0)]
290 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Get all rows in a Pandas DataFrame containing given substring , Get all rows in a Pandas DataFrame containing given substring. 24-12-2018 Code #1: Check the values PG in column Position. filter_none. Get all rows in a Pandas DataFrame containing given substring Let’s see how to get all rows in a Pandas DataFrame containing given substring with the help of different examples. Code #1: Check the values PG in column Position

You may try:

new_df = df[df.Task1.str.contains("Drafting") | df.Task2.str.contains("Drafting") | df.Task3.str.contains("Drafting")]

This will return a new_df with rows containing "Drafting" in any of the "Task1,2,3" columns.

Python, Pandas str.find() method is used to search a substring in each string present sub: String or character to be searched in the text value in series In the following examples, the data frame used contains data of some In this example , a single character 'a' is searched in each string of Name column using� Python: Add column to dataframe in Pandas ( based on other column or list or default value) Python Pandas : How to add rows in a DataFrame using dataframe.append() & loc[] , iloc[] Python Pandas : How to Drop rows in DataFrame by conditions on column values; Python Pandas : Count NaN or missing values in DataFrame ( also row & column wise)

This can be achieved using np.where:

df = pd.DataFrame({
    'Name': ['Ann', 'Juh', 'Jeo', 'Sam'],
    'Age': [43,29,42,59],
    'Task1': ['Drafting a letter', 'Sending', 'Pasting', 'Sending'],
    'Task2': ['Sending', 'Paking', 'Sending', 'Pasting'],
    'Task3': ['Packing', 'Letter Drafting', 'Paking', 'Letter Drafting']
    })

df_new = df.iloc[df.index[np.concatenate(
                np.where(df['Task1'].str.contains('Drafting')) +
                np.where(df['Task2'].str.contains('Drafting')) +
                np.where(df['Task3'].str.contains('Drafting'))).astype(int)
            ].values.tolist()]

print(df_new)

  Name  Age              Task1    Task2            Task3
0  Ann   43  Drafting a letter  Sending          Packing
1  Juh   29            Sending   Paking  Letter Drafting
3  Sam   59            Sending  Pasting  Letter Drafting

Pandas: Select rows that match a string, Select rows of a Pandas DataFrame that match a (partial) string. want to select all rows where the column 'model' starts with the string 'Mac'. Let’s see how to Select rows based on some conditions in Pandas DataFrame. Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Percentage’ is greater than 80 using basic method.

You can try something like this,

new_df = df[(df['Task1'] == 'Drafting') | (df['Task2'] == 'Drafting') | (df['Task3'] == 'Drafting')]

This will select all the rows if the columns Task1 or Task2 or Task3 contains 'Drafting`.

pandas.read_table — pandas 1.1.0 documentation, If you want to pass in a path object, pandas accepts any os. cannot automatically detect the separator, but the Python parsing engine can, Column( s) to use as the row labels of the DataFrame , either given as string first column as the index, e.g. when you have a malformed file with delimiters at the end of each line. Select Rows based on any of the multiple values in column Select rows in above DataFrame for which ‘ Product ‘ column contains either ‘ Grapes ‘ or ‘ Mangos ‘ i.e subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes']) ]

labels: String or list of strings referring row or column name. axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns. index or columns: Single label or list. index or columns are an alternative to axis and cannot be used together. level: Used to specify level in case data frame is having multiple level index.

Let’s delete all rows for which column ‘Age’ has value 30 i.e. # Get names of indexes for which column Age has value 30 indexNames = dfObj[ dfObj['Age'] == 30 ].index # Delete these row indexes from dataFrame dfObj.drop(indexNames , inplace=True) Contents of updated dataframe object dfObj will be,

Pandas : Get frequency of a value in dataframe column/index & find its positions in Python; Pandas : Get unique values in columns of a Dataframe in Python; Pandas : count rows in a dataframe | all or those only that satisfy a condition; Pandas: Convert a dataframe column into a list using Series.to_list() or numpy.ndarray.tolist() in python

Comments
  • its advisable to create a sample (small) dataframe which demonstrates the issue and post as text. Also please do post an expected output showing the difference inpyt v/s output. This will help users to get a clear picture of what is needed and will drive more answers. Thanks
  • Hi, lay out please a small example of the original data
  • The string contains logic you want to implement is for complete word matches? Should 'raft' match with 'Drafting' or only the isolated word 'raft' (which may appear in a sentence: 'I like to use a raft'?
  • yes I need to get the columns that contain exactly 'Drafting' . No other combination(Regular expression is not useful).
  • I like this, though may need to match on something like '(?<![a-zA-Z\-])Drafting' if a word like redrafting is not meant to match.
  • @anky many thanks, as this is working for more than one word. Many thanks for your kind reply.
  • @Shara My pleasure. :)
  • @anky_91 could you please help me with the problem stackoverflow.com/questions/54944193/…
  • @dondapati str.contains('|'.join(list_of_words)) ?? if not , please can you post a fresh question. :) Thanks
  • how about df.filter(like='Task').apply(lambda x: x.str.contains('Drafting')).any(axis=1) ??
  • Bit less than previous but not close to @Sergey.
  • thanks for that. :) think calling the series individually wins on time but calling them individually is manual.
  • I think the fastest is df.loc[df.filter(like='Task').applymap(lambda x: 'Drafting' in x).any(1)]. Though at the expense of the Series.str.contains NaN and error handling.
  • equals, not contains
  • I have to check a key word in the sentence of a column. Not the exact keyword. I have added a sample result requirement.