在2009年的最后一天,我迎来了新的一颗星,也借此机会祝大家新年快乐!为了庆祝这一时刻,我们来探讨一个技术问题:如何从数据库中按分组获取最大N条记录。由于技术版面无法散分,特此征集解决方案。
假设我们有一个表t2,其结构如下:
CREATE TABLE t2 (
id INT PRIMARY KEY,
gid CHAR(1),
col1 INT,
col2 INT
) ENGINE=MyISAM;
插入一些示例数据:
INSERT INTO t2 VALUES
(1,'A',31,6), (2,'B',25,83), (3,'C',76,21), (4,'D',63,56), (5,'E',3,17),
(6,'A',29,97), (7,'B',88,63), (8,'C',16,22), (9,'D',25,43), (10,'E',45,28),
(11,'A',2,78), (12,'B',30,79), (13,'C',96,73), (14,'D',37,40), (15,'E',14,86),
(16,'A',32,67), (17,'B',84,38), (18,'C',27,9), (19,'D',31,21), (20,'E',80,63),
(21,'A',89,9), (22,'B',15,22), (23,'C',46,84), (24,'D',54,79), (25,'E',85,64),
(26,'A',87,13), (27,'B',40,45), (28,'C',34,90), (29,'D',63,8), (30,'E',66,40),
(31,'A',83,49), (32,'B',4,90), (33,'C',81,7), (34,'D',11,12), (35,'E',85,10),
(36,'A',39,75), (37,'B',22,39), (38,'C',76,67), (39,'D',20,11), (40,'E',81,36);
我们希望实现以下结果:
N=1: 每组取col2最大的一条记录
id |
gid |
col1 |
col2 |
---|
6 |
A |
29 |
97 |
15 |
E |
14 |
86 |
24 |
D |
54 |
79 |
28 |
C |
34 |
90 |
32 |
B |
4 |
90 |
N=3: 每组取col2最大的三条记录
id |
gid |
col1 |
col2 |
---|
6 |
A |
29 |
97 |
11 |
A |
2 |
78 |
36 |
A |
39 |
75 |
32 |
B |
4 |
90 |
2 |
B |
25 |
83 |
12 |
B |
30 |
79 |
28 |
C |
34 |
90 |
23 |
C |
46 |
84 |
13 |
C |
96 |
73 |
24 |
D |
54 |
79 |
4 |
D |
63 |
56 |
9 |
D |
25 |
43 |
15 |
E |
14 |
86 |
25 |
E |
85 |
64 |
20 |
E |
80 |
63 |
注:
1)不限数据库类型,但请说明所使用的版本,例如 Oracle Database 10g 10.2, MySQL 5.1.33。
2)不限方法,可以使用SQL语句或存储过程。
解决方案
方法一:使用LEFT JOIN和GROUP BY
SELECT a.id, a.gid, a.col1, a.col2 FROM t2 a
LEFT JOIN t2 b ON a.gid = b.gid AND a.col2 <= b.col2
GROUP BY a.id, a.gid, a.col1, a.col2
HAVING COUNT(b.id) <= 3
ORDER BY a.gid, a.col2 DESC;
方法二:使用子查询
SELECT a.id, a.gid, a.col1, a.col2 FROM t2 a
WHERE 3 >= (
SELECT COUNT(*) FROM t2 b
WHERE a.gid = b.gid AND a.col2 <= b.col2
)
ORDER BY a.gid, a.col2 DESC;
方法三:使用窗口函数(适用于SQL Server 2005及以上版本)
WITH RankedRecords AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY gid ORDER BY col2 DESC) AS rn
FROM t2
)
SELECT id, gid, col1, col2
FROM RankedRecords
WHERE rn <= 3
ORDER BY gid, col2 DESC;
方法四:使用APPLY
SELECT DISTINCT b.*
FROM t2 a
CROSS APPLY (
SELECT TOP(3) *
FROM t2
WHERE a.gid = gid
ORDER BY col2 DESC
) AS b;
恭喜各位技术人员,希望这些解决方案对您有所帮助!