filtering a df values within quotes

pandas query tutorial
pandas series filter by value
how to filter data in python
pandas filter
pandas dataframe filter multiple columns
pandas loc
pandas query in list
pandas query with variable

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)
df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc

I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(\d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"
Feed list of dictionaries to pd.DataFrame

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

print(res)

                    id                 instance          time
0  3214039276626790405                   (null)  08:59:38.000
1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000
2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

How to Filter Rows of Pandas Dataframe with Query function , A simpler alternative in Pandas to select or filter rows dataframe with specified rows of Pandas dataframe based values of columns in gapminder data. to filter rows is to specify the condition within quotes inside query(). A cleaner approach to filter Pandas dataframe is to use Pandas query() function and select rows. The way to query() function to filter rows is to specify the condition within quotes inside query(). Filtering Rows with Pandas query(): Example 1 # filter rows with Pandas query gapminder.query('country=="United States"').head()

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:

import os
import shlex
import pandas as pd

rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',
 'instance:(null)',
 'id:723927731576482920',
 'channel:sip:confctl.com',
 'type:control',
 'elapsedtime:0.000631',
 'level:info',
 'operation:Init',
 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type
0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control
1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control
2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control
3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control
4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

Helpful Python Code Snippets for Data Exploration in Pandas, #when using numeric values, no quotations df[df.column_y df[filter_bool].​describe() # describes a data frame filtered by filter_bool df.isnull().sum() # count the missing values in each column# drop missing values Often, you may want to subset a pandas dataframe based on one or more values of a specific column. Essentially, we would like to select rows based on one value or multiple values present in a column. Here are SIX examples of using Pandas dataframe to filter rows or select rows based values of a column(s).

Pandas DataFrame .query() method with Python 3.6+ f-strings, DataFrame.query() which is useful for filtering rows when method to select a subset of a dataframe based on row values is Boolean indexing you are building a string to query, you need the '' quotes around my_size to  In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions.

Sterile Filtration: A Practical Approach, Enough filters or filter devices are secured from each lot of the given filter type to df S = standard deviation The report also quotes an example in Appendix E, O. Seven bubble point tests were performed; the individual values are stated in  Explore Filtering Quotes by authors including Alexa Chung, Vint Cerf, and Marissa Mayer at BrainyQuote.

The Sixth Canadian Conference on General Relativity and , On the other hand, filtering with a post-Newtonian template hn, where n to second post- Newtonian order), gives the smaller value SNR|actuai, From the results and (dE/df)u must be replaced by expressions (12.3), which are formally identical. We quote the results for Hn corresponding to a system of two neutron stars,  If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type () convention, mixing int64 and uint64 will result in a float64 dtype. Examples.

Unconventional Photoactive Solids, This value does not change much with increasing temperature. 1) some drastic temperature effects on the membranes and filter paper values; notably the (1) argued that d f = 1.7 for their porous vycor sample (not necessarily the same as Yang et al. quote a totally different model (for an unspecified sample) consisting  The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [.

Comments
  • This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?
  • this log is being generated on command line and i am capturing it in a data farme with code
  • I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
  • @JonClements i have edited my question .
  • @ALollz i have edited my question.
  • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
  • @ak333, I'm using your definition of abc and it works fine for me.
  • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
  • @ak333, That's the problem, use abc as you defined in your question.
  • I apologize..i got it. @jpp
  • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
  • @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
  • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
  • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
  • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...