pandas simple pairwise occurrence

crosstab pandas
pandas groupby
convert crosstab to dataframe pandas
pandas groupby count
pandas dataframe
python crosstab examples
pandas crosstab percentage
pandas pivot

There is a function corr in pandas to create a table with mutual correlation coefficients in presence of sparse data. But how to calculate the number of mutual occurrences in the data instead of correlation coefficient?

i.e.

A = [NaN, NaN, 3]

B = [NaN, NaN, 8]

F(A,B) = 1

A = [1, NaN, NaN]

B = [NaN, NaN, 8]

F(A,B) = 0

I need pandas.DataFrame([A,B]).<function>() -> matrix of occurrences

In pandas, you may want to use dropna: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html

You can do something like

co_occur = df.dropna(how = "any")
the_count = co_occur.shape[0] # number of remaining rows

This will drop all rows where there is any NaN (thereby leaving you only with rows that contain values for every variable) and then count the number of remaining rows.

Alternatively, you could do it with lists (as in your code above) assuming the lists are the same length:

A = [NaN, NaN, 3]
B = [NaN, NaN, 8]

co_occur = len( [i for i in range(len(A)) if A[i] and B[i]] )

pandas.crosstab, Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation  DataFrameGroupBy.all (self, skipna). Return True if all values in the group are truthful, else False. DataFrameGroupBy.any (self, skipna). Return True if any value in the group is truthful, else False.

I am using numpy

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[335]: 1

For you second case

sum(np.sum(~np.isnan(np.array([A,B])),0)==2)
Out[337]: 0

How to count grouped occurrences?, You can do that by >>> import pandas as pd >>> m = pd.DataFrame({'gender': [1, 2, 2, 1, 1, 2, 1], 'rating': [3, 4, 2, 1, 3, 1, 5]}) >  Pairwise Rotation Invariant Co-occurrence Local Binary Pattern Xianbiao Qi1⋆, Rong Xiao2, Jun Guo1, and Lei Zhang2 1Beijing University of Posts and Telecommunications, Beijing 100876, P.R. China

With pandas

(df.A.notnull() & df.B.notnull()).sum()

Or

df.notnull().all(axis=1).sum()

Helpful Python Code Snippets for Data Exploration in Pandas, This shouldn't be taken as a definitive list of pandas code snippets; however, these should count the number of occurrences of each value new_df = pd.​concat([df1, df2], axis=1)#merging dfs based on paired columns; help remove hierarchical indexes while preserving the table in its basic structure Compute pairwise correlation of columns, excluding NA/null values. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior .. versionadded:: 0.24.0 Minimum number of observations required per pair of columns to have a valid result.

Quick and Dirty Data Analysis with Pandas, In this post you will discover some quick and dirty recipes for Pandas to side, but it comes with very handy and easy to use tools for data analysis, and finally at the relationships between attributes in pair-wise scatter plots. pandas.DataFrame.sample¶ DataFrame. sample ( self: ~FrameOrSeries , n=None , frac=None , replace=False , weights=None , random_state=None , axis=None ) → ~FrameOrSeries [source] ¶ Return a random sample of items from an axis of object.

value_counts() Method: Count Unique Occurrences of Values in a , Learn how to use the value_counts() method in Python with pandas through simple examples. Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure. datandarray (structured or homogeneous), Iterable, dict, or DataFrame. Dict can contain Series, arrays, constants, or list-like

API, Compute pairwise correlation of columns, excluding NA/null values. DataFrame.​count ([axis Return index of first occurrence of minimum over requested axis. Construct a Dask DataFrame from a Pandas DataFrame. dask.bag.core. Combine using a simple function that chooses the smaller column. >>> df1 = pd. The size of each point. Possible values are: A single scalar so all points have the same size. A sequence of scalars, which will be used for each point’s size recursively. For instance, when passing [2,14] all points size will be either 2 or 14, alternatively.