原先有按月分的 12 张表,每张表里大概有 2000w 左右的数据
...
CREATE TABLE `parcel_with_no_partition_m05` (
...
)
CREATE TABLE `parcel_with_no_partition_m06` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '数据主键,毫无意义',
`equip_id` int(11) DEFAULT NULL,
`scan_id` int(11) DEFAULT NULL,
`create_datetime` datetime DEFAULT NULL,
`minute_of_hour` int(11) DEFAULT NULL,
`hour_of_day` int(11) DEFAULT NULL,
`day_of_month` int(11) DEFAULT NULL,
`week_of_year` int(11) DEFAULT NULL,
`month_of_year` int(11) DEFAULT NULL,
`parcel_length` int(11) DEFAULT NULL
PRIMARY KEY (`id`)
) ENGINE=innodb AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `parcel_with_no_partition_m07` (
...
)
...
现在想尝试用 mysql8 的 partition 来进行分表,集中存储了 12 个月的一共 2 亿多条数据,每个分区 2000w 条左右的数据,与之前的表数据一致
CREATE TABLE `parcel_with_partition` (
`equip_id` int(11) DEFAULT NULL,
`scan_id` int(11) DEFAULT NULL,
`create_datetime` datetime DEFAULT NULL,
`minute_of_hour` int(11) DEFAULT NULL,
`hour_of_day` int(11) DEFAULT NULL,
`day_of_month` int(11) DEFAULT NULL,
`week_of_year` int(11) DEFAULT NULL,
`month_of_year` int(11) DEFAULT NULL,
`parcel_length` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ENGINE=InnoDB PARTITION BY RANGE(`month_of_year`) (
PARTITION `JAN` VALUES LESS THAN (2),
PARTITION `FEB` VALUES LESS THAN (3),
PARTITION `MAR` VALUES LESS THAN (4),
PARTITION `APR` VALUES LESS THAN (5),
PARTITION `MAY` VALUES LESS THAN (6),
PARTITION `JUN` VALUES LESS THAN (7),
PARTITION `JUL` VALUES LESS THAN (8),
PARTITION `AUG` VALUES LESS THAN (9),
PARTITION `SEP` VALUES LESS THAN (10),
PARTITION `OCT` VALUES LESS THAN (11),
PARTITION `NOV` VALUES LESS THAN (12),
PARTITION `DEC` VALUES LESS THAN (13)
);
我尝试执行了一条业务 sql,发现效率下降了一倍
#从 mysql 自带的分区中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
2018 rows in set (59.56 sec)
#从原始分表中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day`
2018 rows in set, 1 warning (29.13 sec)
又进一步用 explain 和 performance schema 分别观察了两种 sql
mysql 自带分区查询详情
explain:
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| 1 | SIMPLE | parcel_with_partition | JUN | ALL | NULL | NULL | NULL | NULL | 20481855 | 1.11 | Using where; Using temporary |
+----+-------------+-----------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
table i/o:
select * from table_io_waits_summary_by_table where count_star>0 \G;
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: parcel_with_partition
COUNT_STAR: 22123040
SUM_TIMER_WAIT: 55709022384855
MIN_TIMER_WAIT: 55709022384855
AVG_TIMER_WAIT: 2517879
MAX_TIMER_WAIT: 55709022384855
COUNT_READ: 22123040
SUM_TIMER_READ: 55709022384855
MIN_TIMER_READ: 55709022384855
AVG_TIMER_READ: 2517879
MAX_TIMER_READ: 55709022384855
COUNT_WRITE: 0
SUM_TIMER_WRITE: 0
MIN_TIMER_WRITE: 0
AVG_TIMER_WRITE: 0
MAX_TIMER_WRITE: 0
COUNT_FETCH: 22123040
SUM_TIMER_FETCH: 55709022384855
MIN_TIMER_FETCH: 55709022384855
AVG_TIMER_FETCH: 2517879
MAX_TIMER_FETCH: 55709022384855
COUNT_INSERT: 0
SUM_TIMER_INSERT: 0
MIN_TIMER_INSERT: 0
AVG_TIMER_INSERT: 0
MAX_TIMER_INSERT: 0
COUNT_UPDATE: 0
SUM_TIMER_UPDATE: 0
MIN_TIMER_UPDATE: 0
AVG_TIMER_UPDATE: 0
MAX_TIMER_UPDATE: 0
COUNT_DELETE: 0
SUM_TIMER_DELETE: 0
MIN_TIMER_DELETE: 0
AVG_TIMER_DELETE: 0
MAX_TIMER_DELETE: 0
file i/o:
FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_partition#p#jun.ibd
EVENT_NAME: wait/io/file/innodb/innodb_data_file
OBJECT_INSTANCE_BEGIN: 140081009323904
COUNT_STAR: 422002
SUM_TIMER_WAIT: 3965120084812
MIN_TIMER_WAIT: 5739112
AVG_TIMER_WAIT: 9395831
MAX_TIMER_WAIT: 498159493
COUNT_READ: 422002
SUM_TIMER_READ: 3965120084812
MIN_TIMER_READ: 5739112
AVG_TIMER_READ: 9395831
MAX_TIMER_READ: 498159493
SUM_NUMBER_OF_BYTES_READ: 6914080768
COUNT_WRITE: 0
SUM_TIMER_WRITE: 0
MIN_TIMER_WRITE: 0
AVG_TIMER_WRITE: 0
MAX_TIMER_WRITE: 0
SUM_NUMBER_OF_BYTES_WRITE: 0
COUNT_MISC: 0
SUM_TIMER_MISC: 0
MIN_TIMER_MISC: 0
AVG_TIMER_MISC: 0
MAX_TIMER_MISC: 0
原始分表查询详情
explain:
+----+-------------+-----------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
| 1 | SIMPLE | parcel_with_no_partition_m06 | NULL | ALL | NULL | NULL | NULL | NULL | 22123040 | 1.11 | Using where; Using temporary |
+----+-------------+------------------------------+------------+------+---------------+------+---------+------+----------+----------+------------------------------+
table i/o:
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: parcel_with_no_partition_m06
COUNT_STAR: 22123040
SUM_TIMER_WAIT: 23182194259854
MIN_TIMER_WAIT: 23182194259854
AVG_TIMER_WAIT: 1047813
MAX_TIMER_WAIT: 23182194259854
COUNT_READ: 22123040
SUM_TIMER_READ: 23182194259854
MIN_TIMER_READ: 23182194259854
AVG_TIMER_READ: 1047813
MAX_TIMER_READ: 23182194259854
COUNT_WRITE: 0
SUM_TIMER_WRITE: 0
MIN_TIMER_WRITE: 0
AVG_TIMER_WRITE: 0
MAX_TIMER_WRITE: 0
COUNT_FETCH: 22123040
SUM_TIMER_FETCH: 23182194259854
MIN_TIMER_FETCH: 23182194259854
AVG_TIMER_FETCH: 1047813
MAX_TIMER_FETCH: 23182194259854
COUNT_INSERT: 0
SUM_TIMER_INSERT: 0
MIN_TIMER_INSERT: 0
AVG_TIMER_INSERT: 0
MAX_TIMER_INSERT: 0
COUNT_UPDATE: 0
SUM_TIMER_UPDATE: 0
MIN_TIMER_UPDATE: 0
AVG_TIMER_UPDATE: 0
MAX_TIMER_UPDATE: 0
COUNT_DELETE: 0
SUM_TIMER_DELETE: 0
MIN_TIMER_DELETE: 0
AVG_TIMER_DELETE: 0
MAX_TIMER_DELETE: 0
file i/o:
FILE_NAME: /home/wedo/mysql/data/mysql/test/parcel_with_no_partition_m06.ibd
EVENT_NAME: wait/io/file/innodb/innodb_data_file
OBJECT_INSTANCE_BEGIN: 140081009372480
COUNT_STAR: 424687
SUM_TIMER_WAIT: 4045883911263
MIN_TIMER_WAIT: 5301220
AVG_TIMER_WAIT: 9526557
MAX_TIMER_WAIT: 861143891
COUNT_READ: 424687
SUM_TIMER_READ: 4045883911263
MIN_TIMER_READ: 5301220
AVG_TIMER_READ: 9526557
MAX_TIMER_READ: 861143891
SUM_NUMBER_OF_BYTES_READ: 6958071808
COUNT_WRITE: 0
SUM_TIMER_WRITE: 0
MIN_TIMER_WRITE: 0
AVG_TIMER_WRITE: 0
MAX_TIMER_WRITE: 0
SUM_NUMBER_OF_BYTES_WRITE: 0
COUNT_MISC: 0
SUM_TIMER_MISC: 0
MIN_TIMER_MISC: 0
AVG_TIMER_MISC: 0
MAX_TIMER_MISC: 0
在 explain 和 file i/o 中都没有看出来异常情况,但是在 table i/o 时 内部分区 使用的时间远高于 原始的分表
这是不是说明多浪费的时间的罪魁祸首是逻辑 i/o 部分?为什么分表后会出现这样的结果呢?希望大神们多多指教
补充一下环境: 系统:centos7 存储引擎:mysql8 服务器配置:12 核 2.5Ghz 64G 内存 硬盘:机械硬盘 在执行查询 sql 时最大的内存占用 3G
1
xuanbg 2020-06-17 23:36:50 +08:00
2000 万和 2.4 亿能一样? PARTITION BY RANGE 实际上还是 1 张表,只不过数据分开存罢了。
|
2
ajsonx 2020-06-18 00:02:23 +08:00
8 的窗口函数我试过几个,主要是为了编写方便吧。试过 1 千万的数据,性能上都不如写联表查询。
|
3
Pendragon OP @xuanbg 我最开始也是这么想的,但是分析结果显示两种查询方法的硬盘 i/o 时间是一样的,占用的最大内存也是单张分表 /分区的大小,所以分区方式应该并没有把 2.4 亿的数据全部取出来
|
4
mahone3297 2020-06-18 00:18:12 +08:00
#从 mysql 自带的分区中取数据
select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_partition` partition(`JUN`) where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set (59.56 sec) #从原始分表中取数据 select sql_no_cache count(*) as num,`day_of_month` as day,`hour_of_day` as hour,`scan_id` as scanId from `parcel_with_no_partition_m06` where `equip_id` = 3 and `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' group by `scan_id`,`day_of_month`,`hour_of_day` 2018 rows in set, 1 warning (29.13 sec) * 2 个的条件不一样? partition 都是 `create_datetime` >'2020-06-01' and `create_datetime` <= '2020-06-29' 分表 `create_datetime` >'2020-06-01' and `sort_datetime` <= '2020-06-29' * 还有,你是 from `parcel_with_partition` partition(`JUN`) 这样指定其中一张分区表的?能否 from parcel_with_partition_JUN 这样来对比? |
5
mahone3297 2020-06-18 00:19:28 +08:00
分区表的用法,应该是你 where 条件中,有了分区条件,使得,你的 sql 可以直接查询某几张表,而不是所有表。
|
6
flashrick 2020-06-18 09:40:30 +08:00
依我看,你分区的查询用错了啊,你用 month_of_year 分区的,查询的时候就是一条普通的 select 就行了,只是 where 条件要用到 month_of_year 。说错了请轻喷
|
7
Pendragon OP @flashrick
@mahone3297 你说的没错 在 where 条件中指定 month_of_year 的值也能命中相应的分区,我在这里用 partition 强制限制分区是为了减少其他条件对查询的影响, 最终结果都是一样的 |
8
Still4 2020-06-18 10:00:05 +08:00
看了下分区表实现原理,每个分区一个文件,这样的话两个对照表文件大小一致,如果没有加索引,都是扫全表,执行时间不应该差很多啊
再复制一个 parcel_with_no_partition_m06 表,确认下机械盘内道外道的影响? |
9
someonedeng 2020-06-24 17:13:16 +08:00 via Android
话说本来是分表的,为什么又合成一个大表再分区而不是直接对小表分区?
|
10
encro 2020-07-01 17:20:42 +08:00
|