SQL Window Functions - Part 1 (Basic functions overview)

Windows Function :

A window function is a calculation across a set of rows in a table that are somehow related to the current row. 

A window function is similar to aggregate functions combined with group by clauses but have one key difference: Window functions retain the total number of rows between the input table and the output table (or result). Behind the scenes, the window function is able to access more than just the current row of the query result.



Use cases :

1. To measure trends or changes over rows or records in your data.

2. To rank a column for outreach or prioritization.


  • Partition by: A subclause of the OVER clause. Similar to GROUP BY.
  • Over: Typically precedes the partition by that signals what to “GROUP BY”.
  • Aggregates: Aggregate functions that are used in window functions, too (e.g., sum, count, avg).
  • Row_number(): Ranking function where each row gets a different number.
  • Rank(): Ranking function where a row could get the same rank if they have the same value.
  • Dense_rank(): Ranking function similar to rank() but ranks are not skipped with ties.
  • Aliases: Shorthand that can be used if there are several window functions in one query.
  • Percentiles: Defines what percentile a value falls into over the entire table.
  • Lag/Lead: Calculating differences between rows’ values.

The sequence of Code for Window Functions

  1. An aggregation function (e.g., sum, count, or average) + the column you’d like to track
  2. OVER
  3. PARTITION BY + the column you’d like to “group by”
  4. ORDER BY (optional and is often a date column)
  5. AS + the new column name
Syntax :

AGGREGATE_FUNCTION (column_1) OVER
     (PARTITION BY column_2 ORDER BY column_3)
      AS new_column_name;

Example 1:
Create a running total of standard_amt_usd (in the orders table) over order time with no date truncation. Your final table should have four columns: one with the amount being added for each new row, and a second with the running total, account_id and occurred_at


Example 2:
Similar to example 1, create a running total of standard_amt_usd (in the orders table) over order time, but this time, date truncate occurred_at by year and partition by that same year-truncated occurred_at variable.





SELECT order_id,
       order_total,
       order_price,
       SUM(order_total) OVER
           (PARTITION BY month(order_date) ORDER BY order_date) AS running_monthly_sales,
       COUNT(order_id) OVER
           (PARTITION BY month(order_date) ORDER BY order_date) AS running_monthly orders,
       AVG(order_price) OVER
           (PARTITION BY month(order_date) ORDER BY order_date) AS average_monthly_price
FROM  amazon_sales_db
WHERE order_date < '2017-01-01';

Aggregates in windows function








Source : Udacity and https://www.windowfunctions.com/



Comments