后续, Win11 下大小核心调度问题。

2022-11-01 13:51:29 +08:00
 afirefish

之前发过一个帖子。 传送门: https://www.v2ex.com/t/874009#reply30

Intel 12 代大小核,Win10 系统,某些场景下,出现 4 核有难,8 和围观的问题! 之前说过,等 22H2 发布后,安装 Win11 22H2 ,然后重新测试。

结论: 在 Win11 下,相对 Win10 而言,intel 大小核调度确实好一些,但是似乎还是不够好? 可以看到大量负载还是落到了 E 核上面。这个测试虽然不是很严瑾,但也说明了一些问题吧。

和之前同样的场景,qemu 运行 arm64 虚拟机,gcc 编译。参数配置均相同。结果如下:

虚拟机配置:

root@debian-arm64:~# lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           3
Model name:                      Cortex-A72
Stepping:                        r0p3
BogoMIPS:                        125.00
NUMA node0 CPU(s):               0-3
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

root@debian-arm64:~# cat /proc/cpuinfo
processor	: 0
BogoMIPS	: 125.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 1
BogoMIPS	: 125.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 2
BogoMIPS	: 125.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 3
BogoMIPS	: 125.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

root@debian-arm64:~# free -m
               total        used        free      shared  buff/cache   available
Mem:            3923          65        2871           0         987        3709
Swap:            975           0         975

4295 次点击
所在节点    程序员
23 条回复
msg7086
2022-11-02 01:00:10 +08:00
@cubecube 可以坐等降价。主板价格太高了,性价比不如 13 香。东西不差,就是贵。
secondwtq
2022-11-04 04:51:50 +08:00
@mrzx
> 因为调度真正是由 cpu 硬件里的 ITD 来调度的

错误。调度依然是由 OS 控制。
根据 Intel SDM 14.6 的描述,ITD 的功能是 "Hardware provides guidance to the Operating System (OS) scheduler to perform optimal workload scheduling through a memory resident table and software thread specific index (Class ID) that points into that table and selects which data to use for that software thread."

实际上 ITD 是对 Hardware Feedback Interface (HFI) 的扩展(这东西之前就有个名字叫 EHFI ),后者是 LKF 开始加入的一个比较初步的版本(当时貌似叫 Hardware *Guided* Scheduling ,鉴于 LKF 是 2020 年的东西,理论上 Win10 应该是有 HFI 支持的 ...)。Intel SDM 对 HFI 的描述是 "Hardware provides guidance to the Operating System (OS) scheduler to perform optimal workload scheduling through a hardware feedback interface structure in memory." 具体说来,就是 CPU 会评估每个 logical thread 的能力( capability ),目前包括在性能方面的和能效方面的,然后会给 OS 传一个表,表中用一个数值表示每个 logical processor 的不同 capability 。 就这么一个东西。

ITD 的扩展是加入了"Class"的概念,根据优化手册和白皮书,目前有四个 Class ,分别是标量,向量,较新的指令,和自旋等待。ITD 表和 HFI 表类似,但是每个 logical processor 的 capability 不再是一个值,而是会给出执行不同 Class 软件时的 capability (也就是说 HFI 表可以被看做只有一个 class 的 ITD 表)。另外,ITD 加入了一个“Run Time Characteristics”的功能,CPU 会试图将一段时间内运行的代码归类到某一个 Class 中。OS 会读取这些信息,ITD 本身并不决定如何调度。
在描述这些功能时,SDM 使用了如下措辞:
> the lowest performance level of 0 indicates a *recommendation* to the OS to not schedule any software threads on it for performance reasons.
> When the OS Scheduler needs to decide which one of multiple free logical processors to assign to a software
thread that is ready to execute, it *can* choose ...
> When the two software threads in question belong to the same Class ID, the OS Scheduler *can* schedule to higher performance ...
> Zeroing a performance or energy efficiency cell *hints* to the OS that it is beneficial not to schedule ...

类似地,优化手册中对 ITD 的描述如下:
> Intel® Thread Director continually monitors software in real-time giving *hints* to the operating system's
scheduler *allowing* it to make more intelligent and data-driven decisions on thread scheduling. With Intel
Thread Director, hardware provides runtime *feedback* to the OS per thread based on various IPC perfor-
mance characteristics, in the form of ...

白皮书中描述如下:
> ... we wanted to develop a hardware solution that would *assist* the OS achieve optimal runtime scheduling by the Intel Thread Director giving the OS more *information* by monitoring instruction mix, the current state of each core, and the relevant microarchitecture telemetry at much more granular level than would be available via instrumentation
> The Intel Thread Director provides *hints* to the OS scheduler with thread-related information. Using these hints, the OS *can* choose between energy efficiency (EE) and performance (PERF) depending on system parameters like power policy, battery slider, etc.

www.computerbase.de/2021-08/hot-chips-33-intel-alder-lake-steht-und-faellt-mit-dem-thread-director Hot Chips 33: Intel Alder Lake steht und fällt mit dem Thread Director - ComputerBase 这里有 ADL 在 HC33 上的胶片,其中一大块是有关 ITD 的
lore.kernel.org/lkml/20220909231205.14009-1-ricardo.neri-calderon@linux.intel.com [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance - Ricardo Neri 这是 ITD 在 Linux 上的 patch ,邮件正文也包含了对 ITD 工作模型的描述,当然由于 Linux Kernel 是跨平台的,这堆 patch 是先在 Linux 调度器上做了一个根据 Class 调度的框架,然后再把 ITD 作为 x86 的实现塞进去,所以描述本身并不涉及太多细节。

主要由硬件控制的东西倒是有,就是用于调整频率的 Speed Shift ( SDM 中叫 Hardware P-State (HWP)),效果见 chipsandcheese.com/2022/09/15/how-quickly-do-cpus-change-clock-speeds How Quickly do CPUs Change Clock Speeds? – Chips and Cheese ,不过这东西 SKL 就有了。而且 HWP ,HFI ,ITD 都是可以禁用的 ...
mrzx
2022-11-04 09:40:39 +08:00
@secondwtq 那 13 代有没有改进?改进的地方由哪些?

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/891752

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX