Tabulate multiple levels for each column in dataframe

pandas crosstab multiple columns
pandas crosstab percentage
pandas crosstab sum
pandas crosstab plot
pandas crosstab vs pivot
pandas crosstab multi index
pandas crosstab sort
pandas crosstab geeksforgeeks
> head(Gene)
  Key Func.ensGene Func.genericGene Func.refGene
1   1   intergenic       intergenic   intergenic
2   2   intergenic       intergenic   intergenic
3   3   intergenic       intergenic     intronic
4   4       exonic           exonic       exonic
5   5   intergenic       intergenic     intronic
6   6   intergenic       intergenic     intronic

Required Output:

Type          Func.ensGene Func.genericGene Func.refGene
exonic             1              1                1
intergenic         5              5                2
intronic           0              0                3

The solution I tried is working on only one column:

unique(Gene["Func.ensGene"])

Could I get the output table as shown above and get a barplot where X-axis has the 'Type' and the bar represents counts from each column?

Simply use ?xtabs along with ?stack:

xtabs( ~ values + ind , stack(df1[,-1]))

or even shorter as @nicola suggests:

table(stack(df1[,-1]))

for both you get:

#            ind
#values       Func.ensGene Func.genericGene Func.refGene
#  exonic                1                1            1
#  intergenic            5                5            2
#  intronic              0                0            3

do you prefer further working on a data.frame?

as.data.frame.matrix(
    xtabs( ~ values + ind , stack(df1[,-1]))  # or again only table(stack(df1[,-1]))
)

pandas.crosstab, pandas. crosstab (index, columns, values=None, rownames=None, colnames=​None, aggfunc=None, Compute a simple cross tabulation of two (or more) factors. If passed 'columns' will normalize over each column. DataFrame.​pivot. In a DataFrame, the rows and columns are completely symmetric, and just as the rows can have multiple levels of indices, the columns can have multiple levels as well. Consider the following, which is a mock-up of some (somewhat realistic) medical data: In [19]: # hierarchical indices and columnsindex=pd.

We can get all unique levels from the dataframe and for every column calculate the count of each level by first converting it to factor with unique levels calculated before.

unique_names <- unique(unlist(df[-1]))
sapply(df[-1], function(x) table(factor(x, levels = unique_names)))

#           Func.ensGene Func.genericGene Func.refGene
#intergenic            5                5            2
#exonic                1                1            1
#intronic              0                0            3

Tidying/reshaping tables using tidyr, Splitting rows into multiple rows based on delimited values values; Filling gaps in a table with NA; Identifying missing combination in dataframes; Auto-fill down or up For example, in the above table, each column represents two distinct  A - data.frame(a=LETTERS[1:10], x=1:10) class(A) # "data.frame" sapply(A, class) # show classes of all columns typeof(A) # "list" names(A) # show list components dim(A) # dimensions of object, if any head(A) # extract first few (default 6) parts tail(A, 1) # extract last row head(1:10, -1) # extract everything except the last element

Love the base R solutions but using data.table and some magrittr for readability you could get a data.frame directly (instead of a table):

library(magrittr)
library(data.table)
setDT(df)
df %>%
  melt(id.vars = "Key") %>%
  .[, .N, .(variable, value)] %>% 
  dcast(value ~ variable, value.var = "N", fill = 0)

        value Func.ensGene Func.genericGene Func.refGene
1:     exonic            1                1            1
2: intergenic            5                5            2
3:   intronic            0                0            3

Or much more concisely(as suggested by Henrik):

dcast(melt(df, "Key"), value ~ variable)

If you prefer tidyverse functions:

library(tidyr)
df %>%
  gather(key = Key) %>%
  group_by(Key, value) %>%
  count() %>%
  spread(Key, n, fill = 0)

# A tibble: 3 x 4
# Groups:   value [3]
  value      Func.ensGene Func.genericGene Func.refGene
  <chr>             <dbl>            <dbl>        <dbl>
1 exonic                1                1            1
2 intergenic            5                5            2
3 intronic              0                0            3

Data:

df <- data.frame(
  Key              = 1:6, 
  Func.ensGene     = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 
  Func.genericGene = c("intergenic", "intergenic", "intergenic", "exonic", "intergenic", "intergenic"), 
  Func.refGene     = c("intergenic", "intergenic", "intronic", "exonic", "intronic", "intronic"),
  stringsAsFactors = FALSE
)

Pandas Crosstab Explained, The pandas crosstab function builds a cross-tabulation table that can Under the hood, pandas is grouping all the values together by make and is that you can pass in multiple dataframe columns and pandas does all the  Output: Method #3: Using GroupBy.size() This method can be used to count frequencies of objects over single or multiple columns. After grouping a DataFrame object on one or more columns, we can apply size() method on the resulting groupby object to get a Series object containing frequency count.

table: Cross Tabulation and Table Creation, levels to remove for all factors in . a character vector giving the row names for the data frame. tabulate is the underlying function and allows finer control. simple two-way contingency table with(airquality, table(cut(Temp, quantile(Temp​)),  You want to re-compute factor levels of all factor columns in a data frame. Solution. Sometimes after reading in data and cleaning it, you will end up with factor columns that have levels that should no longer be there. For example, d below has one blank row. When it’s read in, the factor columns have a level "", which shouldn’t be part of

Using lists of data frames in complex analyses, There are multiple ways to organize this into a tabular array (data frame). This gives lots of records and few columns, but by having a variable for each possible Goal: Identify the genes whose expression values significantly vary over tumor​  stack: “pivot” a level of the (possibly hierarchical) column labels, returning a DataFrame with an index with a new inner-most level of row labels. unstack: (inverse operation of stack) “pivot” a level of the (possibly hierarchical) row index to the column axis, producing a reshaped DataFrame with a new inner-most level of column labels.

R show data, R provides a variety of methods for summarising data in tabular and other forms. class(A) # "data.frame" sapply(A, class) # show classes of all columns typeof(A) of two detectors was used to make the X-ray observation of the cluster: S 0.03636667 #--Show mean values of a few quantitied, for each  Calculate the mean salary of each department using mean() df.groupBy("department").mean( "salary") groupBy and aggregate on multiple DataFrame columns . Similarly, we can also run groupBy and aggregate on two or more DataFrame columns, below example does group by on department,state and does sum() on salary and bonus columns.

Comments
  • dput(head(Gene)) would have been better than just head(Gene)
  • For those who prefer poorer readability: dcast(melt(df, "Key"), value ~ variable)
  • Thanks @Henrik, I forgot about the summarising capabilities of dcast().