how to find top N descending values in group in dplyr

r select top n values by group
r select first row in group dplyr
dplyr top_n multiple columns
top_n not working
dplyr first
top_n in r
dplyr slice by group
dplyr last n rows

I have a following dataframe in R

  Serivce     Codes
   ABS         RT
   ABS         RT
   ABS         TY
   ABS         DR
   ABS         DR
   ABS         DR
   ABS         DR
   DEF         RT
   DEF         RT
   DEF         TY
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         DR
   DEF         TY
   DEF         SE
   DEF         SE

What I want is service wise code count in descending order

  Serivce     Codes    Count
   ABS         DR        4
   ABS         RT        2 
   ABS         TY        1
   DEF         DR        4
   DEF         RT        2
   DEF         TY        2  

I am doing following in r

df%>% 
group_by(Service,Codes) %>% 
summarise(Count = n()) %>%
top_n(n=3,wt = Count) %>% 
arrange(desc(Count)) %>% 
as.data.frame()   

But,it does not give me what is intended.

We can try with count/arrange/slice

df1 %>% 
   count(Service, Codes) %>%
   arrange(desc(n)) %>% 
   group_by(Service) %>% 
   slice(seq_len(3))
# A tibble: 6 x 3
# Groups:   Service [2]
#  Service Codes     n
#    <chr> <chr> <int>
#1     ABS    DR     4
#2     ABS    RT     2
#3     ABS    TY     1
#4     DEF    DR     4
#5     DEF    RT     2
#6     DEF    SE     2

In the OP's code, we need to arrange by 'Service' too. As @Marius said in the comments, the top_n will include more number of rows if there are ties. One option is to do a second grouping with 'Service' and slice (as showed above) or after the grouping, we can filter

df1 %>% 
  group_by(Service,Codes) %>%
  summarise(Count = n()) %>%
  top_n(n=3,wt = Count)  %>%
  arrange(Service, desc(Count)) %>%
  group_by(Service) %>%
  filter(row_number() <=3)

Select top (or bottom) n rows (by value), Select top (or bottom) n rows (by value). Source: R/top-n.R Unfortunately we could not see an easy way to fix the existing top_n() function without breaking existing code, If x is grouped, this is the number (or fraction) of rows per group. A data frame. n. Number of rows to return for top_n (), fraction of rows to return for top_frac (). If n is positive, selects the top rows. If negative, selects the bottom rows. If x is grouped, this is the number (or fraction) of rows per group. Will include more rows if there are ties. wt. (Optional).

df%>% count(Service,Codes) %>% mutate(rank = dense_rank(desc(n))) %>% filter(rank < 5)

number of rows to return for top_n() just like row_number()
n is group by Service,Codes then count like

top_n: Select top (or bottom) n rows (by value) in dplyr: A Grammar , that uses filter() and min_rank() to select the top or bottom entries in each group, rows with a value, or none. df %>% top_n(-2) if (require("Lahman")) { # Find  n: number of rows to return for top_n(), fraction of rows to return for top_frac(). If x is grouped, this is the number (or fraction) of rows per group. Will include more rows if there are ties. If n is positive, selects the top rows. If negative, selects the bottom rows. wt (Optional). The variable to use for ordering.

In base R, you can do this in two lines.

# get data.frame of counts by service-code pairs
mydf <- data.frame(table(dat))

# get top 3 by service
do.call(rbind, lapply(split(mydf, mydf$Serivce), function(x) x[order(-x$Freq)[1:3],]))

This returns

      Serivce Codes Freq
ABS.1     ABS    DR    4
ABS.3     ABS    RT    2
ABS.7     ABS    TY    1
DEF.2     DEF    DR    4
DEF.4     DEF    RT    2
DEF.6     DEF    SE    2

In the first line use table to get the counts, then convert to data.frame. In the second line, split by service, order by the negative values of order and pull out the first three elements. Combine the results with do.call.

Minimalistic examples of dplyr, In order to retrieve information from data frames we use the package dplyr. mutate(): add new columns; summarise(): aggregates the values; group_by(): change to apply functions to each of the groups separately; top_n(): choose n first/last rows. The first argument of these functions is always the data.frame and all the  Select top (or bottom) n rows (by value) This is a convenient wrapper that uses filter () and min_rank () to select the top or bottom entries in each group, ordered by wt.

5 Data transformation, It tells you that dplyr overwrites some functions in base R. If you want to use the past: it only shows the first few rows and all the columns that fit on one screen. from operating on the entire dataset to operating on it group-by-group. The default gives smallest values the small ranks; use desc(x) to give the largest values  #> # A tibble: 32 x 19 #> # Groups: carrier [1] #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 30 1222 1115 67 1402 1215 #> 2 2013 11 3 1424 1430 -6 1629 1634 #> 3 2013 11 10 1443 1430 13 1701 1634 #> 4 2013 11 17 1422 1430 -8 1610 1634 #> 5 2013 11 25 1803 1759 4 2011 2017 #> 6 2013 11 30 1648 1647 1

Manipulating, analyzing and exporting data with tidyverse, Select certain columns in a data frame with the dplyr function select . Use summarize , group_by , and count to split a data frame into groups of observations, If this runs off your screen and you just want to see the first few rows, you can use a In this hindfoot_cm column, there are no NA s and all values are less than 3. Descending order Source: R/desc.r. desc.Rd. dplyr is a part of the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy.

[PDF] dplyr, group- ing variables. Value. A tibble with one column for each column in .cols and each function in .fns. evaluated once per data frame, not once per group. variable in descending order. Find first non-missing element. tally() is a convenient wrapper for summarise that will either call n() or sum(n) depending on whether you're tallying for the first time, or re-tallying. count() is similar but calls group_by() before and ungroup() after. If the data is already grouped, count() adds an additional group that is removed afterwards. add_tally() adds a column n to a table based on the number of items within each

Comments
  • Works perfectly. I am just wondering why my approach is not working ?
  • @Neil Updated the post
  • @Neil You also get more than 3 rows per group in your original code because top_n includes more than n rows if there are ties.
  • @Neil As Marius said the ties is a problem there. So u can add df1 %>% group_by(Service,Codes) %>% summarise(Count = n()) %>%top_n(n=3,wt = Count) %>% group_by(Service) %>% slice(seq_len(3))