How to add incremental rank based on column value?

rank and dense_rank in sql server with examples
dense_rank function in sql
what is difference between rank() row_number() and dense_rank() in sql
rank() over (partition by
sql rank group by
sql rank over (partition by multiple columns)
rank() over partition by sql
rank in sql w3schools

I have a dataframe in following format:

sample_df <- structure(list(conversationid = c("C1",  "C2", "C2",  "C2", 
"C2",  "C2", "C3",  "C3", "C3",  "C3"), 
sentby = c("Consumer","Consumer", "Agent", "Agent", "Agent", "Consumer", 
"Agent", "Consumer","Agent", "Agent"), 
time = c("2018-04-25 03:54:04.550+0000", "2018-05-11 19:18:05.094+0000", 
     "2018-05-11 19:18:09.218+0000", "2018-05-11 19:18:09.467+0000", 
     "2018-05-11 19:18:13.527+0000", "2018-05-14 22:57:10.004+0000", 
     "2018-05-14 22:57:14.330+0000", "2018-05-14 22:57:20.795+0000", 
     "2018-05-14 22:57:22.168+0000", "2018-05-14 22:57:24.203+0000"),
diff = c(NA, NA, 0.0687333333333333, 0.00415, 0.0676666666666667, NA, 0.0721, 
0.10775, 0.0228833333333333,0.0339166666666667)), 
.Names = c("conversationid", "sentby","time","diff"), row.names = c(NA, 10L), 
class = "data.frame")

Where conversationid is a conversation id and can contain messages sent by either an agent or a customer. What I would like to do is, maintain a running count whenever "Agent" shows up in the conversation, like this:

Target Output:

conversationid  sentby  diff    agent_counter_flag
        C1     Consumer NA          0
        C2     Consumer NA          0
        C2     Agent    0.06873333  1
        C2     Agent    0.00415     2
        C2     Agent    0.06766667  3
        C2     Consumer NA          0
        C3     Agent    0.0721      1
        C3     Consumer 0.10775     0
        C3     Agent    0.02288333  2
        C3     Agent    0.03391667  3

Currently, I am able to partition the dataframe and rank all records grouped by cid using following code:

setDT(sample_df)
sample_df[,Order := rank(time, ties.method = "first"), by = "conversationid"]
sample_df <- as.data.frame(sample_df)

But all it does is rank records within a partition disregarding if it's an "Agent" or "Customer".

Current Output:

   conversationid   sentby  diff    Order
        C1     Consumer NA          1
        C2     Consumer NA          1
        C2     Agent    0.06873333  2
        C2     Agent    0.00415     3
        C2     Agent    0.06766667  4
        C2     Consumer NA          5
        C3     Agent    0.0721      1
        C3     Consumer 0.10775     2
        C3     Agent    0.02288333  3
        C3     Agent    0.03391667  4

How do I proceed so I can have my dataframe as shown in target output? Thanks!


library(data.table)
setDT(sample_df)

sample_df[, agent_counter_flag := {sba = (sentby == 'Agent'); sba*cumsum(sba)}
          , by = conversationid]
sample_df

#     conversationid   sentby                         time       diff agent_counter_flag
#  1:             C1 Consumer 2018-04-25 03:54:04.550+0000         NA                  0
#  2:             C2 Consumer 2018-05-11 19:18:05.094+0000         NA                  0
#  3:             C2    Agent 2018-05-11 19:18:09.218+0000 0.06873333                  1
#  4:             C2    Agent 2018-05-11 19:18:09.467+0000 0.00415000                  2
#  5:             C2    Agent 2018-05-11 19:18:13.527+0000 0.06766667                  3
#  6:             C2 Consumer 2018-05-14 22:57:10.004+0000         NA                  0
#  7:             C3    Agent 2018-05-14 22:57:14.330+0000 0.07210000                  1
#  8:             C3 Consumer 2018-05-14 22:57:20.795+0000 0.10775000                  0
#  9:             C3    Agent 2018-05-14 22:57:22.168+0000 0.02288333                  2
# 10:             C3    Agent 2018-05-14 22:57:24.203+0000 0.03391667                  3

As @Frank points out, this also works

sample_df[, agent_counter_flag := rowid(conversationid, sentby)*(sentby == "Agent")]

Benchmark

sample_df <- replicate(1000, sample_df, simplify = F) %>% rbindlist
microbenchmark(
  rowidFrank = sample_df[, agent_counter_flag := 
                           rowid(conversationid, sentby)*(sentby == "Agent")]
, rowidUwe = sample_df[sentby == "Agent", agent_counter_flag := rowid(conversationid)]
, cumsum   = sample_df[, agent_counter_flag := {sba = (sentby == 'Agent'); sba*cumsum(sba)}
                       , by = conversationid]
, unit = 'relative')

# Unit: relative
# expr            min       lq     mean   median       uq       max neval
# rowidFrank 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100
# rowidUwe   1.448858 1.438742 1.410849 1.414428 1.535292 0.5549433   100
# cumsum     1.322493 1.306228 1.316188 1.261325 1.308371 1.6431036   100

Methods to Rank Rows in SQL Server: ROW_NUMBER(), RANK , By defining the OVER() clause, you can also include the PARTITION If there are duplicate values within the row set, the ranking ID numbers will be assigned arbitrarily. ranks; this gap appears in the sequence after the duplicate values. based on particular column values, the NTILE(3) ranking window  How to add incremental rank based on column value? you add the number of sentby == "Agent" so far How to select rows from a DataFrame based on column values? 409.


This is my data.table solution which uses the rowid() function and creates the new column agent_counter_flag by reference:

library(data.table)
setDT(sample_df)
sample_df[sentby == "Agent", agent_counter_flag := rowid(conversationid)][]
    conversationid   sentby                         time       diff agent_counter_flag
 1:             C1 Consumer 2018-04-25 03:54:04.550+0000         NA                 NA
 2:             C2 Consumer 2018-05-11 19:18:05.094+0000         NA                 NA
 3:             C2    Agent 2018-05-11 19:18:09.218+0000 0.06873333                  1
 4:             C2    Agent 2018-05-11 19:18:09.467+0000 0.00415000                  2
 5:             C2    Agent 2018-05-11 19:18:13.527+0000 0.06766667                  3
 6:             C2 Consumer 2018-05-14 22:57:10.004+0000         NA                 NA
 7:             C3    Agent 2018-05-14 22:57:14.330+0000 0.07210000                  1
 8:             C3 Consumer 2018-05-14 22:57:20.795+0000 0.10775000                 NA
 9:             C3    Agent 2018-05-14 22:57:22.168+0000 0.02288333                  2
10:             C3    Agent 2018-05-14 22:57:24.203+0000 0.03391667                  3

Excel formula: Increment a calculation with ROW or COLUMN , When this formula is copied down column D, it multiplies the value in B6 by a number and we keep subtracting 5 to "normalize" the result back to a 1-based scale: and sum with criteria, dynamically rank values, and create dynamic ranges. The Excel ROW function returns the row number for a reference. For example, ROW(C5) returns 5, since C5 is the fifth row in the spreadsheet. When no reference is provided, ROW returns the row number of the cell which contains the formula. The Excel COLUMN function returns the column number for a reference.


Here you are:

library(dplyr)

df <- data.frame(cid = c(rep("c1", 6), rep("C2", 4)),
                 Sent_by = c("Consumer", "Agent", "Consumer", "Consumer", "Agent", "Agent",
                             "Consumer", "Agent", "Agent", "Consumer"))
df %>% group_by(cid, Sent_by) %>%
  mutate(agent_flag = ifelse(Sent_by == "Agent", 1:n(), NA),
         consumer_flag = ifelse(Sent_by == "Consumer", 1:n(), NA))

SQL RANK() Function Explained By Practical Examples, This tutorial shows how to use SQL RANK() function to find ranks of rows in a result set. The same column values receive the same ranks. The following statements create a new table name t and insert some sample data: CREATE TABLE · SQL Identity · SQL Auto Increment · SQL ALTER TABLE · SQL ADD COLUMN  As shown in the above example, to rank numbers from highest to lowest, you use one of the Excel Rank formulas with the order argument set to 0 or omitted (default). To have number ranked against other numbers sorted in ascending order, put 1 or any other non-zero value in the optional third argument.


Came across this post trying to solve a similar issue with dplyr. You can sum logical values where you've tested for sentby == "Agent" using dplyr's grouping.

The long way, just to spell out what the logical column will look like:

library(dplyr)

sample_df %>%
  mutate(is_agent = sentby == "Agent") %>%
  group_by(conversationid) %>%
  mutate(agent_counter_flag = ifelse(is_agent, cumsum(is_agent), 0)) %>%
  ungroup()
#> # A tibble: 10 x 6
#>    conversationid sentby  time               diff is_agent agent_counter_f…
#>    <chr>          <chr>   <chr>             <dbl> <lgl>               <dbl>
#>  1 C1             Consum… 2018-04-25 03… NA       FALSE                   0
#>  2 C2             Consum… 2018-05-11 19… NA       FALSE                   0
#>  3 C2             Agent   2018-05-11 19…  0.0687  TRUE                    1
#>  4 C2             Agent   2018-05-11 19…  0.00415 TRUE                    2
#>  5 C2             Agent   2018-05-11 19…  0.0677  TRUE                    3
#>  6 C2             Consum… 2018-05-14 22… NA       FALSE                   0
#>  7 C3             Agent   2018-05-14 22…  0.0721  TRUE                    1
#>  8 C3             Consum… 2018-05-14 22…  0.108   FALSE                   0
#>  9 C3             Agent   2018-05-14 22…  0.0229  TRUE                    2
#> 10 C3             Agent   2018-05-14 22…  0.0339  TRUE                    3

You'd probably want to follow that up with select(-is_agent) to drop the logical column.

Or in practice, for an abbreviated form you can call cumsum inside the mutate.

sample_df %>%
  group_by(conversationid) %>%
  mutate(agent_counter_flag = ifelse(sentby == "Agent", cumsum(sentby == "Agent"), 0)) %>%
  ungroup()

Either way, the idea is that within each conversationid, you add the number of sentby == "Agent" so far if it's sent by an agent, or just set to 0 if it's not sent by an agent.

Python, numeric_only: Takes a boolean value and the rank function works on non-​numeric value only if it's False. Example #1: Ranking Column with Unique values. The above calculation for Rank column is if text value is same, rank is same and increases on next values irrespective of region column . I am in need of help in creating rank column (Expected RNK) as shown above. ie based on both the text value and region column. If test 2 is selected in Region slicer and Rank slicer is set to 1,


Using Multi-Row Formula to Rank Records in Alteryx, Alteryx' Multi-Row Formula calculates the value for one row based on values in rows above or below it. We want to create a new Field called  I have sheet like following and I would like to fill the D column value based on B and C. I need a formula which add a value of 100 do D1 and increment it with 50 until the B and C are same( while B1 = B4 and C1 = C4 ) and IF the cell value change it again stats from 100 until the next interval


SQL Server RANK() Function By Practical Examples, The RANK() function adds the number of tied rows to the tied rank to calculate the First, create a new table named sales.rank_demo that has one column: same value B. The fourth and fifth rows get the rank 4 because the RANK() function Drop Schema · Create New Table · Identity Column · Sequence · Add Column  Summary: in this tutorial, you will learn how to use SQL RANK () function to find the rank of each row in the result set. The RANK () function is a window function that assigns a rank to each row in the partition of a result set. The rank of a row is determined by one plus the number of ranks that come before it.


How to use RANKX in DAX (Part 1 of 3 – Calculated Columns , The DAX to add this data to your model is the following calculated table: The last column now carries a value that ranks the table based on the My instruct the RANKX function to keep the numbering sequence contiguous. I have a data source that has project numbers and then user names associated with those projects. I want to set a number field starting with one for the first project and increment up to 2 when I get to the next Project number in the list. I am trying to do a multi-row formula that looks somethin