python pandas data comparing

I'm trying to compare two excels, one is the user matrix, the other one is I generated from a host. I want to know if the user settings are correct as of the matrix.

the results I got the from the host, I imported to pandas: the user groups here is as column names!

    Name Users  Domain Admins     Administrators   Schema Admins 
0   xxx   NaN             Yes                Yes             NaN                                  

the problem is :

the excel matrix is like

user:         groups
xxx           administrators
              schema admins
              domain admins

here is what I have tried:

I will replace all the Yes with the columns name:

for i in df.columns:
df[i].replace('Yes',i,inplace=True)

remove the null from it.

group=df.dropna(axis='columns',how='all')

now it's like this:

  Name Users  Domain  Admins     Administrators  Schema Admins 

   0     xxx   Domain admins    Administrators  Schema Admins 

the other one like:

User Account Name    Group
0    xxx             Domain Admins, Local admin,Administrators

I don't know what to do next. please guide me how to compare the index values in a loop for all the indexs.

the original two excel like this:

user:         groups
xxx           administrators
              schema admins
              domain admins

yyy           administrators
              domain admins

zzz           administrators
              schema admins

the other file like:

username   administrators   schema admins  domain admins
xxx               yes            yes            NaN
yyy               yes            NaN            yes

This is how it can be done:

Step 1: Transform host df

cols = ['administrators', 'schema admins', 'domain admins']
df1['merged'] = df1[cols].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1) ##df1 is host df 

Step 2: Transform your matrix df

df.user = df.user.ffill()  ## Fill the empty rows with same user name
grouped_df = df.groupby("user")['groups'].apply(','.join).reset_index() ## merge same user name to 1 row

Step 3: Comparing both df

result_df = pd.merge(df1, grouped_df, how='inner', left_on="merged", right_on="user") ## The left_on/right_on will change according to the column name you have

pandas.DataFrame.equals — pandas 1.1.2 documentation, In a past post, I described dataframes as part Excel spreadsheet and part SQL table but with all the versatility and analytical power of Python. Pandas compare () function outputs the columns and elements that is different between the dataframes. “self” label in the result correspond to the dataframe that we compare and “other” label corresponds to the dataframe that we compare to. By default, if two corresponding values are equal, they will be shown as NaN.

I would let the pandas imported from the host (let us call it df_host) unchanged, and create columns for groups in the pandas imported from the matrix (called df_matrix):

groups = ['Users', 'Domain Admins', 'Administrators', 'Schema Admins']

for g in groups:
    df_matrix[g] = df_matrix.Group.str.contains(g)

Next I would use the user name as index in both dataframes:

df_matrix.set_index('Account Name', inplace=True)
df_host.set_index('Name', inplace=True)

You can now easily join the dataframes:

df_comp = df_matrix.join(df_host, how='outer', lsuffix='_matrix', rsuffix='_host')

You finally should have a single dataframe with one row per user, and one column for groups seen from the host and seen from the excel matrix, which should allow easy comparisons.

Comparing Pandas Dataframes To One Another, The assumption here is that we're comparing the rows in our data. We need two datasets which have matching columns, but different entries. Steps to compare values of two Pandas DataFrames Step 1: Prepare the two Pandas DataFrames As we have discussed above, we will create two DataFrames using dictionaries. Step 2: Import Numpy package If you don’t know how to install numpy, then check out how to install numpy guide on this Step 3:

You can add data to a dictionary to make things easier. If following is data file:

user:         groups
xxx           administrators
              schema admins
              domain admins
user:         groups
yyy           administrators
              domain admins
user:         groups
zzz           administrators
              schema admins

Following code will create a dictionary:

with open('userdata.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalist = list(map(lambda x: x.strip(), f.readlines())) 
    userdict = {}                               # dictionary to collect data; 
    username=""; grplist = []; newuser = True   # variable to read data from file: 
    for line in datalist: 
        if line.startswith('user:'):
            if not(username=="" and len(grplist)==0):   # omit at first run
                userdict[username] = grplist            # put user data into dictionary
                username=""; grplist=[]; newuser=True       # clear variable for new user; 
        elif newuser:
            username, grpname = list(map(lambda x: x.strip(), line.split()))
            grplist.append(grpname)     # append group name to temporary list
            newuser = False
        else: 
            grplist.append(line)        # append more groups; 

userdict[username] = grplist
print(userdict)

Output:

{'yyy': ['administrators', 'domain admins'], 'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins']}

If data in second file is as follows:

  Account Name                               Group
          xxx  administrators , schema admins, domain admins
          yyy  administrators , domain admins
          zzz  administrators , schema admins

Following code will get dictionary from it:

with open('userdata2.txt', 'r') as f:
    # read data file and split into lines; also trim lines; 
    datalines = list(map(lambda x: x.strip(), f.readlines())) 
    userdict2={}
    for line in datalines[1:]:  # omit first line which is only header
        infolist = list(map(lambda x: x.strip(), line.split(" ",1)))
        username = infolist[0].strip()
        grplist = list(map(lambda x: x.strip(), infolist[1].split(",")))
        userdict2[username] = grplist

print(userdict2)

Output:

{'zzz': ['administrators', 'schema admins'], 'xxx': ['administrators', 'schema admins', 'domain admins'], 'yyy': ['administrators', 'domain admins']}

To compare 2 dictionaries, just use ==:

print(userdict == userdict2)

Output:

True

To compare groups of a particular user:

print(userdict['xxx'] == userdict1['xxx'])

Output:

True

Comparing Rows Between Two Pandas DataFrames, The “==” operator works for multiple values in a Pandas Data frame too. Following two examples will show how to compare and select data� Once you imported the CSV files into Python, you’ll be able to assign each file into a DataFrame, where: File_1 will be assigned to df1; File_2 will be assigned to df2; As before, the goal is to compare the prices (i.e., Price1 vs. Price2). So here is the complete Python code to compare the values from the two imported files:

Python, Both have date indexes and the same structure. How can we compare these two dataframes and find which rows are in dataframe 2 that aren't in� Honestly, I am not yet an expert Pandas user, but I aim to be one. That’s why whenever I learn something new and useful, I will try to document it here. Today’s subject is comparing dataframes for equality (or inequality). Often when working with data stored in dataframes, we will need to know whether they are the same.

Quick Tip: Comparing two pandas dataframes and , In this post let us see a simple example of Pandas compare function on two similar data frames and summarize the differences. Let us load� Quick Tip: Comparing two pandas dataframes and getting the differences Posted on January 3, 2019 by Eric D. Brown, D.Sc. There are times when working with different pandas dataframes that you might need to get the data that is ‘different’ between the two dataframes (i.e.,g Comparing two pandas dataframes and getting the differences).

How To Compare Two Dataframes with Pandas compare?, DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index). So with your data (slightly edited to have a NaN in the score column): import sys if� A easy way to get around this and compare the values is to use. final_df1==final_df2. However, this will do a element by element comparison, so it wont work if you are using it to assert a statement for example in pytest. TL;DR. What works well is. all(final_df1 == final_df2).

Comments
  • Your excel matrix , does it have blanks for the user row when it is associated with multiple groups or it is repeated multiple times ?
  • No, After I imported the matrix into the pandas , actually the record has \n in between.Domain Admins\nLocal admin\nAdministrators.
  • Answer given let me know if it works
  • Hi Rahul, it doesn't work. maybe my question is not clear. let me explain again. ``` user: groups xxx administrators schema admins domain admins yyy administrators domain admins zzz administrators schema admins ``` the other file like: ```` username administrators schema admins domain admins xxx yes yes NaN yyy yes NaN yes
  • You need to check the sample df which I created in my answer and what you have..are those look similar or different. If it is different, kindly put a image of how you df look in pandas!1
  • it's different, I've updated the question, I put the original excel at the end.
  • Changed according to your new excel format..you can upvote the answer if you found useful
  • can you run it?
  • yes, it's a very good idea to use the dictionary. I think this will be the right way, I've updated the excel at the end. I want to know if user xxx 's group matches in both excel.
  • To compare groups of a particular user, one can use print(userdict['xxx'] == userdict1['xxx']). I have added this in my answer above. You should also upvote/accept answer that is helpful to you.
  • hi, the second file is different, please see my update.
  • You should accept this answer and post another question with new comparison. Otherwise answers will become very long.