Snowflake Windows Analytical Function to set grouping set

snowflake window functions
snowflake aggregate functions
snowflake rank vs row_number
snowflake partition by multiple columns
snowflake lag function
snowflake window function in where clause
rank snowflake
snowflake sum multiple columns

I have following data set for data lake which is acting as source for Dimension where I want to migrate the history data in Dimension

For e.g.: image

Primarykey       Checksum     DateFrom     Dateto      ActiveFlag 
  1                  11         01:00       03:00         False
  1                  22         03:00       05:00         False 
  1                  22         05:00       07:00         False
  1                  11         07:00       09:00         False
  1                  11         09:00    12/31/999         TRUE

Please note that datalake table have multiple columns which are not part of dimension so we are recalculating the checking show same value but datefrom and dateto

with base as (
Select 
   Primary_key,
   checksum,
   first_value ( datefrom ) over ( partition by Primary_key ,checksum order by datefrom ) as Datefrom,
   last_value ( dateto ) over ( partition by Primary_key  ,checksum order by datefrom ) as Dateto,
   rownumber () over ( partition by Primary_key  ,checksum order by datefrom ) as latest_record 
from Datalake.user)
select * from base where latest_record = 1

Data shown as

Primarykey       Checksum     DateFrom     Dateto 
   1              11           01:00         12/31/999 
   1              22           03:00         07:00

But Expected out is

Primarykey       Checksum     DateFrom     Dateto 
   1              11           01:00         03:00 
   1              22           03:00         07:00
   1              11           07:00         12/31/999 

I tried using multiple ways in single query but any good suggestions?

The reason you are get only two rows is you have two columns in your partitions Primarykey and checksum and those only have two combinations. The line you are wanting in the Expected output has the same Primarykey and checksum (1,11) as the first row in your expected output.

The thing I see in your data that would get your result would be if you included ActiveFlag into your partitions.

WITH base AS (
    SELECT 
       primary_key,
       checksum,
       FIRST_VALUE (datefrom) OVER ( PARTITION BY primary_key, checksum, active_flag order by datefrom) AS datefrom,
       LAST_VALUE (dateto) OVER ( partition BY primary_key, checksum, active_flag order by datefrom) AS dateto,
       ROWNUMBER () OVER ( partition BY primary_key, checksum, active_flag order by datefrom) AS latest_record 
    FROM Datalake.user
)
SELECT * FROM base WHERE latest_record = 1

Snowflake Windows Analytical Function to set grouping set, The reason you are get only two rows is you have two columns in your partitions Primarykey and checksum and those only have two� The RANK function returns a positive integer value between 1 and the number of rows in the window (inclusive). As with any other window function, when a rank-related function is called, you explicitly pass it not only a row (or more precisely, 1 or more columns of a row, or expressions based on the columns in the row), but also a window of rows.

Try this code. Should work both in Snowflake and Oracle: Create a separate group if checksum changes order by datefrom

**SNOWFLAKE**:
WITH base AS (
SELECT 
Primarykey,
   checksum,
   FIRST_VALUE( datefrom ) OVER ( PARTITION BY Primarykey ,checksum,checksum_group     ORDER BY datefrom ) AS Datefrom,
   LAST_VALUE( dateto ) OVER ( PARTITION BY Primarykey  ,checksum,checksum_group     ORDER BY datefrom ) AS Dateto,
   ROW_NUMBER() over ( PARTITION BY Primarykey  ,checksum,checksum_group ORDER BY     datefrom ) AS latest_record 
FROM(   
SELECT 
Primarykey,
   checksum,
   checksum_prev,
   datefrom,
   dateto,
   LAST_VALUE((case when checksum<>checksum_prev THEN group1 END)) IGNORE NULLS OVER     (
  ORDER BY group1
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) checksum_group
 FROM (
SELECT 
   Primarykey,
   checksum,
   datefrom,
   dateto,
   LAG(checksum, 1, 0) OVER (ORDER BY datefrom) AS checksum_prev,
   LPAD(1000 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)), 4, 0) as group1
FROM Datalake.user)
)
) 
SELECT * FROM base WHERE latest_record = 1

**Oracle**:
WITH base AS (
SELECT 
Primarykey,
   checksum,
   FIRST_VALUE ( datefrom ) OVER ( partition by Primarykey ,checksum,checksum_group     order by datefrom ) AS Datefrom,
   LAST_VALUE ( dateto ) OVER ( partition by Primarykey  ,checksum,checksum_group     order by datefrom ) AS Dateto,
   ROW_NUMBER() OVER ( PARTITION BY Primarykey  ,checksum,checksum_group ORDER BY     datefrom ) AS latest_record 
FROM(   
SELECT 
Primarykey,
   checksum,
   checksum_prev,
   datefrom,
   dateto,
   LAST_VALUE((CASE WHEN checksum<>checksum_prev THEN group1 END)) IGNORE NULLS 
   OVER (ORDER BY group1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)     checksum_group
 FROM (
SELECT 
   Primarykey,
   checksum,
   datefrom,
   dateto,
   LAG(checksum, 1, 0) OVER (ORDER BY DATEFROM) AS checksum_prev,
   LPAD(1000 + ROWNUM, 4, 0) as group1
FROM Datalake.user))) 
SELECT * FROM base WHERE latest_record = 1

STDDEV with window function as an analytic function, I need this date_rank column for the 156 record window. update dw_table t1. set t1.rwa_stddev = ( select STDDEV(t2.rwa). from dw_table t2. Introduction to Snowflake ROW_NUMBER() Function. The ROW_NUMBER() is an analytic function that generates a non-persistent sequence of temporary values which are calculated dynamically when the query is executed. The ROW_NUMBER() function assigns a unique incrementing number for each row within a partition of a result set.

I tweaked the query so it could work on entire data set. Due to missing primary key , it was failing for entire data. Modified working query

enter image description here

Analytic/Window Functions, Analytic functions are used to compute an aggregate over a group of rows, often for the window, i.e. how the data will be grouped before applying the function. CREATE OR REPLACE TABLE example_cumulative (p INT, o INT, i INT);� Snowflake has plenty of aggregate and sequencing functions available. Analytical and statistical function on Snowflake. Analytical and statistical functions provide information based on the distribution and properties of the data inside a partition. Analytical and statistical functions in Snowflake: MEDIAN; CORR - correlation with non-null pairs in a partition/group

Snowflake Analytics - Part 7 – Lag, It is the initial ordering of the data set that gives these analytics the name “ Ordered The other name is “Window Functions,” because they calculate within a certain window of rows. Their region groups the salespeople. This type of Azure Function would be called by a client from a browser or a mobile app. Keep in mind that Snowflake is an OLAP centric data warehouse and not an OLTP transactional database. Queries should be analytical in nature and you might have the results of the query transformed into a chart in the client app.

SQL for Analysis and Reporting, The processing order is shown in Figure 22-1. Result set partitions. The analytic functions allow users to divide query result sets into groups of rows called� Arguments¶ expr1. This is the expression for which you want to know the number of distinct values. expr2. This is the optional expression used to group rows into partitions.

Database Products with Window Functions Support, (1) A WINDOW clause, that can be used to define a generic window definition. Vendor-specific aggregates may offer extensions for grouping sets, such as GROUPING Snowflake, with support They are still called Analytic Functions and they provide one of the most complete sets of window functions. Use the PARTITION BY clause to partition the query result set into groups based on one or more value_expr. If you omit this clause, then the function treats all rows of the query result set as a single group. To use the query_partition_clause in an analytic function, use the upper branch of the syntax (without parentheses).

Comments
  • This will not work as active flag can be false and next record with True with different PK inserted.
  • @snowflakeuser you cannot get the answer you want with the limited information. I didn't say it would work, for all the thing you didn't mention, but for the data shown that would have the effect. In the end of the day this is your problem to solve, we can just help teach you how the tools work, or show idea that seem to fit the task as you have described it.
  • When I run with entire data set , result are not adding up. This is due to Group_ value calculated.
  • Can you pls provide some example for this scenario