mysql 查询历史用户的疑问?

2023-04-23 20:13:47 +08:00
 Grocker

表 t

uid status
1 1
2 2
1 2
1 3
2 3

查询只存在 status=1 的用户,这两种写法有什么区别吗

SELECT * FROM t where `status` = 1 and uid not IN (

SELECT uid FROM  t as t1 WHERE `status` in (2,3) and t.uid = t1.uid
)
SELECT * FROM t where `status` = 1 and uid not IN (

SELECT uid FROM  t as t1 WHERE `status` in (2,3) 
)

场景是查历史用户,即 status != 2,3 的,以上是简化的模型,两个字段都有索引

上面的多了一个子查询和外表的条件,在表数据量大的情况下性能谁更好,leader 说第一种查询效率会更高,因为它只在满足外表的条件下进行了过滤。或者有没有其他性能更好的写法?

1409 次点击
所在节点    程序员
8 条回复
kkwa56188
2023-04-23 21:16:53 +08:00
不是做 Mysql 的, 感觉是 1 比较快.
其次 用 in (value1, value2) 通常是很慢的, 索引不一定管用.

我会换成:
SELECT * FROM t where t.status = 1
and not exists ( SELECT uid FROM t as t1 WHERE status = 2 and t1.uid = t.uid )
and not exists ( SELECT uid FROM t as t2 WHERE t2.status = 3 and t2.uid = t.uid )
joetse
2023-04-23 23:03:51 +08:00
为什么不用 bitmap?
akira
2023-04-23 23:20:47 +08:00
status 区分度这么低,再来个 not in ,性能不好上来吧。试试这么考虑呢,虽然是有点取巧了

select uid, min(status) `mins`, max(status) `maxs`
from t
group by uid
having `mins` = 1 and `maxs` = 1
512357301
2023-04-23 23:29:12 +08:00
用 group by 呢?
select
uid,
sum(if(status = 1,1,0)) one_cnt,
sum(if(status = 2 or status = 3,1,0)) other_cnt
from t
group by uid
having one_cnt > 0 and other_cnt = 0

没在机器上测试,反正我下意识的写法是这么写,楼主可以测试下。

PS:对楼主和楼上各位的子查询写法感觉很别扭,我一直用不惯子查询嵌套,更习惯 left join ,不过 mysql 出身的貌似都这么写,存在即是合理,我不争辩。
xuanbg
2023-04-24 09:49:51 +08:00
我会写成这样:
with have23 as (SELECT uid FROM t as t1 WHERE `status` in (2,3))
select * from t left join have23 h on h.uid = t.uid where h.uid is null;
Grocker
2023-04-24 09:52:54 +08:00
@akira 实际场景是有 4 种状态,ABCD ,存在 AB 且没有 CD 状态的用户
Grocker
2023-04-24 09:53:54 +08:00
@xuanbg 忘了说了是 mysql5.7
akira
2023-04-24 14:20:15 +08:00
@Grocker 那就分别对 4 个状态做统计,思路是一样的

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/934875

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX