RANK() OVER PARTITION with RANK resetting

rank() over (partition by
rank and dense_rank in sql server with examples
rank function in sql - stack overflow
where rank = 1 sql
mysql rank() over partition example
rank in partition
rank vs row_number
rank() over sum sql

How can I get a RANK that restarts at partition change? I have this table:

ID    Date        Value  
1     2015-01-01  1  
2     2015-01-02  1 <redundant  
3     2015-01-03  2  
4     2015-01-05  2 <redundant  
5     2015-01-06  1  
6     2015-01-08  1 <redundant  
7     2015-01-09  1 <redundant  
8     2015-01-10  2  
9     2015-01-11  3  
10    2015-01-12  3 <redundant  

and I'm trying to delete all the rows where the Value is not changed from the previous entry (marked with < redundant). I've tried using cursors but it takes too long, as the table has ~50 million rows.

I've also tried using RANK:

SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging 
ORDER BY Date ASC 

but I get:

ID    Date        Value  Rank   (Rank)
1     2015-01-01  1      1      (1)
2     2015-01-02  1      2      (2)
3     2015-01-03  2      1      (1)
4     2015-01-05  2      2      (2)
5     2015-01-06  1      3      (1)
6     2015-01-08  1      4      (2)
7     2015-01-09  1      5      (3)
8     2015-01-10  2      3      (1)
9     2015-01-11  3      1      (1)
10    2015-01-12  3      2      (2)

in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows.

EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. In the end I've decided to use the CURSOR afterall. I've split the data in chuncks of about 250k rows and the cursor runs through and deletes the rows in ~11 mins per batch of 250k rows, and the answers below, with DELETE, take ~35 mins per batch of 250k rows.

select * 
from  ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag 
        from table ) tt
where ValueLag is null or ValueLag <> Value  

if the order is Date then over (order by Date)

this should show you good and bad - it is based on ID - it you need date then revise it may look like a long way around but it should be pretty efficient

declare @tt table  (id tinyint, val tinyint);
insert into @tt values 
( 1, 1),
( 2, 1),
( 3, 2),
( 4, 2),
( 5, 1),
( 6, 1),
( 7, 1),
( 8, 2),
( 9, 3),
(10, 3);

select id, val, LAG(val) over (order by id) as lagVal
from @tt;

-- find the good
select id, val 
from ( select id, val, LAG(val) over (order by id) as lagVal
       from @tt 
     ) tt
where  lagVal is null or lagVal <> val 

-- select the bad 
select tt.id, tt.val 
  from @tt tt
  left join ( select id, val 
                from ( select id, val, LAG(val) over (order by id) as lagVal
                         from @tt 
                     ) ttt
               where   ttt.lagVal is null or ttt.lagVal <> ttt.val 
            ) tttt 
    on tttt.id = tt.id 
 where tttt.id is null

Methods to Rank Rows in SQL Server: ROW_NUMBER(), RANK , Four ranking window functions use the OVER() clause that defines a is specified, the ranking row number will be reset for each partition. The RANK() ranking window function returns a unique rank number for each distinct row within the partition according to a specified column value, starting at 1 for the first row in each partition, with the same rank for duplicate values and leaving gaps between the ranks; this gap appears in the sequence after the duplicate values.

Here is a somewhat convoluted way to do it:

WITH CTE AS
(
    SELECT  *, 
            ROW_NUMBER() OVER(ORDER BY [Date]) RN1,
            ROW_NUMBER() OVER(PARTITION BY Value ORDER BY [Date]) RN2
    FROM dbo.YourTable
), CTE2 AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Value, RN1 - RN2 ORDER BY [Date]) N
    FROM CTE
)
SELECT *
FROM CTE2
ORDER BY ID;

The results are:

╔════╦════════════╦═══════╦═════╦═════╦═══╗
║ ID ║    Date    ║ Value ║ RN1 ║ RN2 ║ N ║
╠════╬════════════╬═══════╬═════╬═════╬═══╣
║  1 ║ 2015-01-01 ║     1 ║   1 ║   1 ║ 1 ║
║  2 ║ 2015-01-02 ║     1 ║   2 ║   2 ║ 2 ║
║  3 ║ 2015-01-03 ║     2 ║   3 ║   1 ║ 1 ║
║  4 ║ 2015-01-05 ║     2 ║   4 ║   2 ║ 2 ║
║  5 ║ 2015-01-06 ║     1 ║   5 ║   3 ║ 1 ║
║  6 ║ 2015-01-08 ║     1 ║   6 ║   4 ║ 2 ║
║  7 ║ 2015-01-09 ║     1 ║   7 ║   5 ║ 3 ║
║  8 ║ 2015-01-10 ║     2 ║   8 ║   3 ║ 1 ║
║  9 ║ 2015-01-11 ║     3 ║   9 ║   1 ║ 1 ║
║ 10 ║ 2015-01-12 ║     3 ║  10 ║   2 ║ 2 ║
╚════╩════════════╩═══════╩═════╩═════╩═══╝

To delete the rows you don't want, you just need to do:

DELETE FROM CTE2
WHERE N > 1;

how to group and rank (with reset) – SQLServerCentral, This is unlike row_number() over (partition by drv, acd order by p, led), which. does not reset the ranks at each P=1. It continues increasing the  The RANK() function is useful for top-N and bottom-N queries. The following illustrates the syntax of the RANK() function: RANK() OVER ([ query_partition_clause ] order_by_clause) The order_by_clause is required. It species the order of rows in each partition to which the RANK() function applies.

If you want to delete the rows, I would suggest you use lag():

with todelete as (
      select t.*, lag(value) over (order by date) as prev_value
      from t
     )
delete from todelete
    where value = prev_value;

I'm not quite sure what rank() has to do with the problem.

EDIT:

To see the rows not deleted with the same logic:

with todelete as (
      select t.*, lag(value) over (order by date) as prev_value
      from t
     )
select *
from todelete
where value <> prev_value or prev_value is null;

The where clause is just the inverse of the where clause in the first query, taking NULL values into account.

RANK (ANSI), I have the following data set. when I try to simpley rank by partitioning UID & T as follows in order to rank the id's consecutively: RANK() OVER  The RANK() function is a window function that assigns a rank to each row within a partition of a result set. The rows within a partition that have the same values will receive the same rank. The rank of the first row within a partition is one.

This is interesting so I'd thought I'd jump in. Unfortunately, creating a solution with RANK() (or rather, ROW_NUMBER()) without first transforming the data looks to be unobtainable. In an attempt to transform the data, I came up with this solution that uses 1 ROW_NUMBER():

;WITH Ordered AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY [Date]) AS [Row], *
    FROM DataLogging
),
Final AS
(
    SELECT
        o1.*, NULLIF(o1.Value - ISNULL(o2.Value, o1.Value - 1), 0) [Change]
    FROM
        Ordered o1
        LEFT JOIN Ordered o2 ON
            o1.[Row] = o2.[Row] + 1
)
SELECT * FROM Final

In the last Change column, the value will be NULL if there is no change in value (but will have the difference if there is a change).

So to do the delete, change the select to

DELETE FROM DataLogging where Change IS NULL

Edit: Lag would work here too but I was visualizing the solution as I went along and completely forgot about that.

RANK() with partition by is not reset - Transact-SQL, The DENSE_RANK() is a window function that assigns a rank to each row within a partition of a result set. Unlike the It resets the rank when the partition boundary is crossed. Using SQL Server DENSE_RANK() over a result set example. The DENSE_RANK function uses a different rule for ranking rows that include non-unique values. The RANK function takes no argument, but the empty parentheses must be specified. If the OVER clause specifies the optional window PARTITION clause, the rankings are calculated within the subset of rows that each partition defines.

Worked for my case! thanks I had to fetch the report_to change for an employee with respect to the previous reports_to valueand effdt. In other words, fetcth min effective date row for each reports_to change for an employee.

with tocheck as ( select T.emplid,T.reports_to,T.effdt, lag(reports_to) over (order by effdt) as prev_value from PS_JOB t ) select * from tocheck where reports_to <> prev_value or prev_value is null;

added changes further as p

SQL Server DENSE_RANK Function By Practical Examples, RANK() OVER (Partition by column name Order by column name) Partition by reset the row number based on the column name. Lets check  in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows. EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. In the end I've decided to use the CURSOR afterall.

ROW_NUMBER(),RANK() AND DENSE_RANK() – SQL Server, It resets the rank when the partition boundary is crossed. The following statements Using SQL DENSE_RANK() over the result set example. The following  The RANK() is a window function that calculates the rank of a row in a set of rows. The RANK() returns the same rank for the rows with the same value. Because RANK() adds the number of tied rows to the tied rank to calculate the next rank, the ranks may not be sequential. In other words, there may have gaps in the sequential rank numbering.

SQL DENSE_RANK() Function, These functions will calculate rankings, percentiles, and n-tiles of the values. The following is a syntax template of the function: Rank() over ( [ partition by expression, The rank value will be reset each time the partition group changes. The DENSE_RANK () function is applied to the rows of each partition defined by the PARTITION BY clause, in a specified order, defined by ORDER BY clause. It resets the rank when the partition boundary is crossed. The PARITION BY clause is optional. If you omit it, the function will treat the whole result set as a single partition.

Oracle SQL and PL/SQL Handbook: A Guide for Data Administrators, , The RANK function assigns each row a unique number except for duplicate rows, order_by_clause) RANK () OVER ( [query partition_clause] order_by_clause) partitioning clause lets you create independent rankings and resets the rank 

Comments
  • In your example, how do you tell the first set of value = 1 (ids 1 and 2) from the second set (ids 5, 6, and 7)?
  • I don't really understand the question. taken chronologicaly there are other values in between.
  • @AlinI: Does it matter to you which specific rows are considered redundant in each group? E.g., does it matter if the earliest row is kept vs. the latest row in each group?
  • yes, it matters, the earliest row should remain in table.
  • The problem with your attempts based on the window function version of RANK() is that the partitions you want are not determined strictly by the value column, nor indeed by any combination of columns. They are functions of pairs of rows, relative to a particular row order. The lag() and lead() window functions are just the thing for handling this, as Gordon suggested in his answer.
  • Hmm, I wasn't familiar with the LEAD function. Seems pretty useful. I didn't downvote but it's not clear how to take this to the next step (removing the redundant rows).
  • Not my downvote, but it may be the first revision of your answer had a syntax error. Hopefully, they can take it back when they notice that you fixed it
  • You did show a select. Lead / lag is very efficient when you only want adjacent.
  • Something is not quite right with the logic here,, it returns rows 2, 4, 7, 8, 10. If the intention is to reveal redundant rows, it should return 2, 4, **6**, 7, 10.
  • @mellamokb My intention is that it would show the good rows. I will look at it.
  • this seems to work, but if I use SELECT ID FROM CTE2 WHERE N > 1 ORDER BY ID; i get too many rows: ~237k vs ~175k which are redundant.
  • @AlinI I just replicated the expected result posted in your question
  • If I get the whole table it's true, the rankings are as expected. Now, how do I go about deleting the unnecesarry rows?