## how to find top N descending values in group in dplyr

r select first row in group dplyr

dplyr top_n multiple columns

top_n not working

dplyr first

top_n in r

dplyr slice by group

dplyr last n rows

I have a following dataframe in R

Serivce Codes ABS RT ABS RT ABS TY ABS DR ABS DR ABS DR ABS DR DEF RT DEF RT DEF TY DEF DR DEF DR DEF DR DEF DR DEF TY DEF SE DEF SE

What I want is service wise code count in descending order

Serivce Codes Count ABS DR 4 ABS RT 2 ABS TY 1 DEF DR 4 DEF RT 2 DEF TY 2

I am doing following in r

df%>% group_by(Service,Codes) %>% summarise(Count = n()) %>% top_n(n=3,wt = Count) %>% arrange(desc(Count)) %>% as.data.frame()

But,it does not give me what is intended.

We can try with `count/arrange/slice`

df1 %>% count(Service, Codes) %>% arrange(desc(n)) %>% group_by(Service) %>% slice(seq_len(3)) # A tibble: 6 x 3 # Groups: Service [2] # Service Codes n # <chr> <chr> <int> #1 ABS DR 4 #2 ABS RT 2 #3 ABS TY 1 #4 DEF DR 4 #5 DEF RT 2 #6 DEF SE 2

In the OP's code, we need to `arrange`

by 'Service' too. As @Marius said in the comments, the `top_n`

will include more number of rows if there are ties. One option is to do a second grouping with 'Service' and `slice`

(as showed above) or after the grouping, we can `filter`

df1 %>% group_by(Service,Codes) %>% summarise(Count = n()) %>% top_n(n=3,wt = Count) %>% arrange(Service, desc(Count)) %>% group_by(Service) %>% filter(row_number() <=3)

**Select top (or bottom) n rows (by value),** Select top (or bottom) n rows (by value). Source: R/top-n.R Unfortunately we could not see an easy way to fix the existing top_n() function without breaking existing code, If x is grouped, this is the number (or fraction) of rows per group. A data frame. n. Number of rows to return for top_n (), fraction of rows to return for top_frac (). If n is positive, selects the top rows. If negative, selects the bottom rows. If x is grouped, this is the number (or fraction) of rows per group. Will include more rows if there are ties. wt. (Optional).

df%>% count(Service,Codes) %>% mutate(rank = dense_rank(desc(n))) %>% filter(rank < 5)

##### number of rows to return for top_n() just like row_number()

##### n is group by Service,Codes then count like

**top_n: Select top (or bottom) n rows (by value) in dplyr: A Grammar ,** that uses filter() and min_rank() to select the top or bottom entries in each group, rows with a value, or none. df %>% top_n(-2) if (require("Lahman")) { # Find n: number of rows to return for top_n(), fraction of rows to return for top_frac(). If x is grouped, this is the number (or fraction) of rows per group. Will include more rows if there are ties. If n is positive, selects the top rows. If negative, selects the bottom rows. wt (Optional). The variable to use for ordering.

In base R, you can do this in two lines.

# get data.frame of counts by service-code pairs mydf <- data.frame(table(dat)) # get top 3 by service do.call(rbind, lapply(split(mydf, mydf$Serivce), function(x) x[order(-x$Freq)[1:3],]))

This returns

Serivce Codes Freq ABS.1 ABS DR 4 ABS.3 ABS RT 2 ABS.7 ABS TY 1 DEF.2 DEF DR 4 DEF.4 DEF RT 2 DEF.6 DEF SE 2

In the first line use `table`

to get the counts, then convert to data.frame. In the second line, split by service, order by the negative values of `order`

and pull out the first three elements. Combine the results with `do.call`

.

**Minimalistic examples of dplyr,** In order to retrieve information from data frames we use the package dplyr. mutate(): add new columns; summarise(): aggregates the values; group_by(): change to apply functions to each of the groups separately; top_n(): choose n first/last rows. The first argument of these functions is always the data.frame and all the Select top (or bottom) n rows (by value) This is a convenient wrapper that uses filter () and min_rank () to select the top or bottom entries in each group, ordered by wt.

**5 Data transformation,** It tells you that dplyr overwrites some functions in base R. If you want to use the past: it only shows the first few rows and all the columns that fit on one screen. from operating on the entire dataset to operating on it group-by-group. The default gives smallest values the small ranks; use desc(x) to give the largest values #> # A tibble: 32 x 19 #> # Groups: carrier [1] #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time #> <int> <int> <int> <int> <int> <dbl> <int> <int> #> 1 2013 1 30 1222 1115 67 1402 1215 #> 2 2013 11 3 1424 1430 -6 1629 1634 #> 3 2013 11 10 1443 1430 13 1701 1634 #> 4 2013 11 17 1422 1430 -8 1610 1634 #> 5 2013 11 25 1803 1759 4 2011 2017 #> 6 2013 11 30 1648 1647 1

**Manipulating, analyzing and exporting data with tidyverse,** Select certain columns in a data frame with the dplyr function select . Use summarize , group_by , and count to split a data frame into groups of observations, If this runs off your screen and you just want to see the first few rows, you can use a In this hindfoot_cm column, there are no NA s and all values are less than 3. Descending order Source: R/desc.r. desc.Rd. dplyr is a part of the tidyverse, an ecosystem of packages designed with common APIs and a shared philosophy.

**[PDF] dplyr,** group- ing variables. Value. A tibble with one column for each column in .cols and each function in .fns. evaluated once per data frame, not once per group. variable in descending order. Find first non-missing element. tally() is a convenient wrapper for summarise that will either call n() or sum(n) depending on whether you're tallying for the first time, or re-tallying. count() is similar but calls group_by() before and ungroup() after. If the data is already grouped, count() adds an additional group that is removed afterwards. add_tally() adds a column n to a table based on the number of items within each

##### Comments

- Works perfectly. I am just wondering why my approach is not working ?
- @Neil Updated the post
- @Neil You also get more than 3 rows per group in your original code because
`top_n`

includes more than`n`

rows if there are ties. - @Neil As Marius said the ties is a problem there. So u can add
`df1 %>% group_by(Service,Codes) %>% summarise(Count = n()) %>%top_n(n=3,wt = Count) %>% group_by(Service) %>% slice(seq_len(3))`