如何按天统计前 7 天周活?

2023-03-02 15:02:55 +08:00
 killva4624

比如下表,记录了日期和对应用户:

datetime,username
2023-01-01,UserA
2023-01-01,UserB
2023-01-02,UserA
2023-01-03,UserC
...

需求是按天统计出该天起前七天的周活,当天没有对应用户的话补 0 ,如:

2023-01-08(2023-01-01~2023-01-07), 7
2023-01-09(2023-01-02~2023-01-08), 10

....

1951 次点击
所在节点    MySQL
14 条回复
pota
2023-03-02 15:15:55 +08:00
这种我感觉还是将每天的统计数据存起来直接拿好点
oRbceGJ
2023-03-02 15:31:16 +08:00
用偏移类开窗函数?
Ericcccccccc
2023-03-02 15:34:17 +08:00
没必要强行用纯 sql 做这个事情吧.
aw2350
2023-03-02 15:35:36 +08:00
parttition by ,按照日期汇总
yuhu96
2023-03-02 15:41:12 +08:00
能不能按照当前[date ,date - 7 ]作为一个分组字段来进行聚合啊。窗口函数感觉不适用在这里。
byty18768
2023-03-02 15:42:45 +08:00
搞一张存储自然日的日期表,excel 生成很快的
`
datetime,start_date,end_date
2023-01-08,2023-01-01,2023-01-07
2023-01-09,2023-01-02,2023-01-08
2023-01-10,2023-01-03,2023-01-09
`
笛卡尔积
`
select a. datetime ,count(distinct username)
from 日期表 a
join 数据表 b on b. datetime between a. start_date and a.end_date
group by 1
`
Chad0000
2023-03-02 15:43:23 +08:00
记录日活,主键:
日期_用户 Id 。没活的不记。然后周活就是最近七天的,Group By 用户 Id 即可。
JKeita
2023-03-02 15:50:02 +08:00
按天取数据存到一份文本文件中,然后直接用 linux 命令进行去重计数。
killva4624
2023-03-02 17:31:43 +08:00
@byty18768 直接用子查询生成这个这个自然日表也可以吧?
YYYYMMDDHHSS
2023-03-02 17:40:33 +08:00
with data as (

select '2023-01-01' as dt ,'UserA' as user_name union all
select '2023-01-01' as dt ,'UserB' as user_name union all
select '2023-01-02' as dt ,'UserA' as user_name union all
select '2023-01-03' as dt ,'UserC' as user_name union all
select '2023-01-04' as dt ,'UserA' as user_name union all
select '2023-01-05' as dt ,'UserB' as user_name union all
select '2023-01-06' as dt ,'UserA' as user_name union all
select '2023-01-07' as dt ,'UserC' as user_name union all
select '2023-01-09' as dt ,'UserA' as user_name union all
select '2023-01-10' as dt ,'UserB' as user_name union all
select '2023-01-11' as dt ,'UserA' as user_name union all
select '2023-01-12' as dt ,'UserC' as user_name union all
select '2023-01-13' as dt ,'UserA' as user_name union all
select '2023-01-14' as dt ,'UserB' as user_name union all
select '2023-01-15' as dt ,'UserA' as user_name union all
select '2023-01-16' as dt ,'UserC' as user_name union all
select '2023-01-17' as dt ,'UserA' as user_name union all
select '2023-01-18' as dt ,'UserB' as user_name
)
select
t.dt,
date_add(t.dt, 7) dt_to,
count(distinct t.user_name) as cnt
from
(
select
d1.dt,
d2.user_name
from
data d1
join data d2
where
1 = 1
--and d1.dt > d2.dt
and datediff(d1.dt, d2.dt) <= 7
) t
group by
dt
----

大数据量不建议这样做
sunny1688
2023-03-02 18:30:58 +08:00
@Chad0000 老哥的方案是最轻松还解耦的,一条简单的 sql 就行了
vitoliu
2023-03-02 18:35:13 +08:00
SQL 挺难想的,我反正比较难硬加出来。通过代码来构建逻辑简单清晰,灵活性更强。
统计某一个时段的周活,可以按天、小时统计,定时任务去跑,还可以无限加条件。比如用户注册时间小于 3 个月,判断新用户留存情况。
28Sv0ngQfIE7Yloe
2023-03-02 18:41:42 +08:00
这种不都是定时调度 ETL 做成指标放到数仓里吗?
byty18768
2023-03-06 10:27:40 +08:00
@killva4624 主要是你不是说可能会存在整天都没有用户的嘛,所以考虑导入表来补 0 。如果底层每天都是全的肯定没问题

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/920522

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX