这篇博客给大家介绍 Hive 中最常用的四个窗口函数,分别是:LEAD, LAG, FIRST_VALUE 以及 LAST_VALUE。
- FIRST_VALUE:取分组内排序后,截止到当前行,第一个值。
- LAST_VALUE:取分组内排序后,截止到当前行,最后一个值。
- LEAD (col, n, default):用于统计窗口内往下滴 n 行值。第一个参数为列名,第二个参数为往下滴 n 行(默认为 1 ),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)
- LAG (col, n, default):与 lead 函数相反,用于统计窗口内网上第 n 行值。
下面我们举例来看看:
测试数据集如下所示:
FIRST_VALUE 与 LAST_VALUE
select
user_id,
user_type,
row_number() over(partition by user_type order by sales) as row_num,
first_value(user_id) over(partition by user_type order by sales desc) as max_sales_user,
first_value(user_id) over(partition by user_type order by sales asc) as min_sales_user,
last_value(user_id) over(partition by user_type order by sales desc) as curr_last_min_user,
last_value(user_id) over(partition by user_type) order by sales asc) as curr_last_max_user
from
order_detail;
查询结果为:
+----------+------------+----------+-----------------+-----------------+---------------------+---------------------+--+
| user_id | user_type | row_num | max_sales_user | min_sales_user | curr_last_min_user | curr_last_max_user |
+----------+------------+----------+-----------------+-----------------+---------------------+---------------------+--+
| wutong | new | 7 | wutong | qibaqiu | wutong | wutong |
| lilisi | new | 6 | wutong | qibaqiu | qishili | lilisi |
| qishili | new | 5 | wutong | qibaqiu | qishili | lilisi |
| wanger | new | 4 | wutong | qibaqiu | wanger | wanger |
| zhangsa | new | 3 | wutong | qibaqiu | zhangsa | zhangsa |
| liiu | new | 2 | wutong | qibaqiu | qibaqiu | liiu |
| qibaqiu | new | 1 | wutong | qibaqiu | qibaqiu | liiu |
| liwei | old | 3 | liwei | lisi | liwei | liwei |
| wangshi | old | 2 | liwei | lisi | wangshi | wangshi |
| lisi | old | 1 | liwei | lisi | lisi | lisi |
+----------+------------+----------+-----------------+-----------------+---------------------+---------------------+--+
LEAD 与 LAG
select
user_id, device_id,
lead(device_id) over(order by sales) as default_after_one_line,
lag(device_id) over(order by sales) as default_before_one_line,
lead(device_id,2) over (order by sales) as after_two_line,
lag(device_id,2,'abc') over (order by sales) as before_two_line
from
order_detail;
查询结果如下:
+----------+-------------+-------------------------+--------------------------+-----------------+------------------+--+
| user_id | device_id | default_after_one_line | default_before_one_line | after_two_line | before_two_line |
+----------+-------------+-------------------------+--------------------------+-----------------+------------------+--+
| qibaqiu | fds | fdsfagwe | NULL | 543gfd | abc |
| liiu | fdsfagwe | 543gfd | fds | f332 | abc |
| lisi | 543gfd | f332 | fdsfagwe | dfsadsa323 | fds |
| wangshi | f332 | dfsadsa323 | 543gfd | hfd | fdsfagwe |
| zhangsa | dfsadsa323 | hfd | f332 | 65ghf | 543gfd |
| liwei | hfd | 65ghf | dfsadsa323 | fds | f332 |
| wanger | 65ghf | fds | hfd | dsfgg | dfsadsa323 |
| qishili | fds | dsfgg | 65ghf | 543gdfsd | hfd |
| lilisi | dsfgg | 543gdfsd | fds | NULL | 65ghf |
| wutong | 543gdfsd | NULL | dsfgg | NULL | fds |
+----------+-------------+-------------------------+--------------------------+-----------------+------------------+--+
版权声明:本文为weixin_38746310原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。