问题导读: 1.NTILE作用是什么? 2.按照pv降序排列,生成分组内每天的pv名次可使用哪个窗口函数? 3.RANK 和 DENSE_RANK作用是什么? 接上篇:Hive分析窗口函数(一)SUM,AVG,MIN,MAX 本文中介绍前几个序列函数,NTILE,ROW_NUMBER,RANK,DENSE_RANK,下面会一
问题导读:
1.NTILE作用是什么?
2.按照pv降序排列,生成分组内每天的pv名次可使用哪个窗口函数?
3.RANK 和 DENSE_RANK作用是什么?
接上篇:Hive分析窗口函数(一)SUM,AVG,MIN,MAX
数据准备:
COOKIE1,2015-04-10,1 COOKIE1,2015-04-11,5 COOKIE1,2015-04-12,7 COOKIE1,2015-04-13,3 COOKIE1,2015-04-14,2 COOKIE1,2015-04-15,4 COOKIE1,2015-04-16,4 COOKIE2,2015-04-10,2 COOKIE2,2015-04-11,3 COOKIE2,2015-04-12,5 COOKIE2,2015-04-13,6 COOKIE2,2015-04-14,3 COOKIE2,2015-04-15,9 COOKIE2,2015-04-16,7 CREATE EXTERNAL TABLE lxw1234 ( COOKIEid string, createtime string, --day pv INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/tmp/lxw11/'; DESC lxw1234; COOKIEid STRING createtime STRING pv INT hive> select * from lxw1234; OK COOKIE1 2015-04-10 1 COOKIE1 2015-04-11 5 COOKIE1 2015-04-12 7 COOKIE1 2015-04-13 3 COOKIE1 2015-04-14 2 COOKIE1 2015-04-15 4 COOKIE1 2015-04-16 4 COOKIE2 2015-04-10 2 COOKIE2 2015-04-11 3 COOKIE2 2015-04-12 5 COOKIE2 2015-04-13 6 COOKIE2 2015-04-14 3 COOKIE2 2015-04-15 9 COOKIE2 2015-04-16 7
SELECT COOKIEid, createtime, pv, NTILE(2) OVER(PARTITION BY COOKIEid ORDER BY createtime) AS rn1, --分组内将数据分成2片 NTILE(3) OVER(PARTITION BY COOKIEid ORDER BY createtime) AS rn2, --分组内将数据分成3片 NTILE(4) OVER(ORDER BY createtime) AS rn3 --将所有数据分成4片 FROM lxw1234 ORDER BY COOKIEid,createtime; COOKIEid day pv rn1 rn2 rn3 ------------------------------------------------- COOKIE1 2015-04-10 1 1 1 1 COOKIE1 2015-04-11 5 1 1 1 COOKIE1 2015-04-12 7 1 1 2 COOKIE1 2015-04-13 3 1 2 2 COOKIE1 2015-04-14 2 2 2 3 COOKIE1 2015-04-15 4 2 3 3 COOKIE1 2015-04-16 4 2 3 4 COOKIE2 2015-04-10 2 1 1 1 COOKIE2 2015-04-11 3 1 1 1 COOKIE2 2015-04-12 5 1 1 2 COOKIE2 2015-04-13 6 1 2 2 COOKIE2 2015-04-14 3 2 2 3 COOKIE2 2015-04-15 9 2 3 4 COOKIE2 2015-04-16 7 2 3 4
SELECT COOKIEid, createtime, pv, NTILE(3) OVER(PARTITION BY COOKIEid ORDER BY pv DESC) AS rn FROM lxw1234; --rn = 1 的记录,就是我们想要的结果 COOKIEid day pv rn ---------------------------------- COOKIE1 2015-04-12 7 1 COOKIE1 2015-04-11 5 1 COOKIE1 2015-04-15 4 1 COOKIE1 2015-04-16 4 2 COOKIE1 2015-04-13 3 2 COOKIE1 2015-04-14 2 3 COOKIE1 2015-04-10 1 3 COOKIE2 2015-04-15 9 1 COOKIE2 2015-04-16 7 1 COOKIE2 2015-04-13 6 1 COOKIE2 2015-04-12 5 2 COOKIE2 2015-04-14 3 2 COOKIE2 2015-04-11 3 3 COOKIE2 2015-04-10 2 3
SELECT COOKIEid, createtime, pv, ROW_NUMBER() OVER(PARTITION BY COOKIEid ORDER BY pv desc) AS rn FROM lxw1234; COOKIEid day pv rn ------------------------------------------- COOKIE1 2015-04-12 7 1 COOKIE1 2015-04-11 5 2 COOKIE1 2015-04-15 4 3 COOKIE1 2015-04-16 4 4 COOKIE1 2015-04-13 3 5 COOKIE1 2015-04-14 2 6 COOKIE1 2015-04-10 1 7 COOKIE2 2015-04-15 9 1 COOKIE2 2015-04-16 7 2 COOKIE2 2015-04-13 6 3 COOKIE2 2015-04-12 5 4 COOKIE2 2015-04-14 3 5 COOKIE2 2015-04-11 3 6 COOKIE2 2015-04-10 2 7
SELECT COOKIEid, createtime, pv, RANK() OVER(PARTITION BY COOKIEid ORDER BY pv desc) AS rn1, DENSE_RANK() OVER(PARTITION BY COOKIEid ORDER BY pv desc) AS rn2, ROW_NUMBER() OVER(PARTITION BY COOKIEid ORDER BY pv DESC) AS rn3 FROM lxw1234 WHERE COOKIEid = 'COOKIE1'; COOKIEid day pv rn1 rn2 rn3 -------------------------------------------------- COOKIE1 2015-04-12 7 1 1 1 COOKIE1 2015-04-11 5 2 2 2 COOKIE1 2015-04-15 4 3 3 3 COOKIE1 2015-04-16 4 3 3 4 COOKIE1 2015-04-13 3 5 4 5 COOKIE1 2015-04-14 2 6 5 6 COOKIE1 2015-04-10 1 7 6 7 rn1: 15号和16号并列第3, 13号排第5 rn2: 15号和16号并列第3, 13号排第4 rn3: 如果相等,则按记录值排序,生成唯一的次序,如果所有记录值都相等,或许会随机排吧。