Python pandas column to replace string boolean values to actual boolean type

Python pandas column to replace string boolean values to actual boolean type

pandas replace column values with another column
pandas replace multiple values
pandas replace values in dataframe with values from another dataframe
pandas convert string to boolean
pandas replace values in column based on condition
replace the priority column values with yes should be 1 and 'no should be 0
python replace inplace
replace in pandas column

I want to replace string boolean type present inside a column with actual boolean values.

kdf = pd.DataFrame(data={'col1' : [True, 'True', np.nan], 'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': [False, True, np.nan]})

so here, I want to convert True(index 1 on col1) to actual boolean type True. What I did was,

kdf.loc[kdf['col1'].str.contains('true', na=False, case=False)] = True
kdf.loc[kdf['col1'].str.contains('false', na=False, case=False)] = False

which converts the column values to actual type but I'm in need of creating a function which accepts only the df column, do an in-line replace and return the modified column (like col.fillna). Note that we are not allowed to pass the whole df into that func. So I can't use df.loc.

Also I'm bit worry about performance, is there anyother way?


df['col'] = df['col'].apply(lambda x: True if x == 'true' else False)

I think the above should work.

Hope this helps!

How to convert 'false' to 0 and 'true' to 1 in Python, How do you change true/false to 1 0 in Python? Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Pandas Series.str.replace() method works like Python.replace() method only, but it works on Series


Why not using replace

df.replace({'True':True,'False':False})
# df.replace({'True':True,'False':False}).applymap(type)
Out[123]: 
              bnan            bool             col1             dt
0   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
1   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
2  <class 'float'>  <class 'bool'>  <class 'float'>  <class 'str'>

Update

df.replace({'True':True,'False':False},regex=True).applymap(type)

Sample data notice I added the leading and trailing space

df = pd.DataFrame(data={'col1' : [True, ' True', np.nan], 'dt': [' 2018-12-12', ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': ['False  ', True, np.nan]})

How to replace column values in a Pandas dataframe in Python, being returned. (A bit code golfy.) You can use x. I would set the index of your dataframe to the ID and Boolean columns, and the construct an new index from the Cartesian product of the unique values. That would look like this: import pandas indexcols = ['ID', 'Boolean'] data = pandas.read_sql_query(engine, querytext) full_index = pandas.MultiIndex.from_product( [data['ID'].unique(), [0, 1]], names=indexcols


Expanding on @89f3a1c's solution and @AvinashRaj's Comment:

We introduce the following data problems in the data. 1. The string 'True' is changed to ' true '. This introduces case-mismatch and leading and trailing spaces.

import pandas as pd
from datetime import datetime

kdf = pd.DataFrame(data={'col1' : [True, ' true  ', np.nan], 
                         'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 
                         'bool': [False, True, True], 
                         'bnan': [False, True, np.nan]})

kdf['col1'] = kdf['col1'].apply(lambda x: True if str(x).strip() in ['true','True'] else False)

Dataframe:

    col1    dt  bool    bnan
0   True    2019-09-19 03:22:06.734861  False   False
1   true    2018-12-12 00:00:00.000000  True    True
2   NaN 2019-12-12 00:00:00.000000  True    NaN

Output:

    col1    dt  bool    bnan
0   True    2019-09-19 03:26:47.611914  False   False
1   True    2018-12-12 00:00:00.000000  True    True
2   False   2019-12-12 00:00:00.000000  True    NaN

How to concatenate a boolean to a string in Python, How do I change the value of a column in pandas DataFrame? Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed. inplace bool, default False. If True, in place.


pandas.DataFrame.replace, How do you convert a Boolean to a string in Python? df.replace(to_replace=None, value=None) replaces values given in “to_replace” with “value”. (10) Check for NANs pd.isnull(object) Detect missing values (NaN in numeric arrays, None/NaN in object arrays) (11) Drop a feature df.drop('feature_variable_name', axis=1) axis is either 0 for rows, 1 for columns (12) Convert object type to float


pandas.DataFrame.replace, Second, if regex=True then all of the strings in both lists will be interpreted as regexs regex : bool or same types as to_replace , default False Replace values based on boolean condition; Series.str.replace: Simple string replacement​. Boolean Indexing in Pandas In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data.


Working with missing data, Second, if regex=True then all of the strings in both lists will be interpreted as regexs otherwise they will match Nested dictionaries, e.g., {'a': {'b': nan}}, are read as follows: look in column 'a' for the value 'b' and replace it with nan. inplace : boolean, default False regex : bool or same types as to_replace , default False. dtypes is the function used to get the data type of column in pandas python.It is used to get the datatype of all the column in the dataframe. Let’s see how to. Get the data type of all the columns in pandas python; Ge the data type of single column in pandas; Let’s first create the dataframe.