riazjack218
32 天前
不定时死机的问题之前我也遇到过,每次隔一周左右就死机了,情况和楼主的描述也大差不差,不过我的系统是 esxi
```
2024-09-21T09:16:15.508Z cpu2:2099372)[45m[33;1mVMware ESXi 6.7.0 [Releasebuild-15160138 x86_64][0m
Machine Check Exception: Fatal MCE on PCPU2 in world 2099372:vmm2:linux-2?System has encountered a Hardware Error - Please contact the hardware vendor
2024-09-21T09:16:15.508Z cpu2:2099372)cr0=0x80050033 cr2=0x7f3384751518 cr3=0x12e6ea000 cr4=0x152660
2024-09-21T09:16:15.508Z cpu2:2099372)frame=0x451a0261bec0 ip=0x41801354745b err=18 rflags=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)rax=0xffffffffffffffff rbx=0xffffffffffffffff rcx=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)rdx=0xffffffffffffffff rbp=0x1 rsi=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)rdi=0xffffffffffffffff r8=0xffffffffffffffff r9=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)r10=0xffffffffffffffff r11=0xffffffffffffffff r12=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)r13=0xffffffffffffffff r14=0xffffffffffffffff r15=0xffffffffffffffff
2024-09-21T09:16:15.509Z cpu2:2099372)pcpu:0 world:2099368 name:"vmm0:ikuai (V)
2024-09-21T09:16:15.509Z cpu2:2099372)pcpu:1 world:2099590 name:"vmm3:linux-1" (V)
2024-09-21T09:16:15.509Z cpu2:2099372)pcpu:2 world:2099372 name:"vmm2:linux-2 (V)
2024-09-21T09:16:15.509Z cpu2:2099372)pcpu:3 world:2099371 name:"vmm1:linux-3 (V)
2024-09-21T09:16:15.509Z cpu2:2099372)@BlueScreen: Machine Check Exception: Fatal MCE on PCPU2 in world 2099372:vmm2:linux-2?System has encountered a Hardware Error - Please contact the hardware vendor
2024-09-21T09:16:15.509Z cpu2:2099372)Code start: 0x418013400000 VMK uptime: 6:06:35:27.868
```
后来排查发现是宿主机中 CPU 使用率过高,虚拟机无法正常获取资源导致的磁盘 io 延迟过高进而导致 esxi 的崩溃;楼主如果有空的话可尝试装个 esxi 观察几天,等出现紫屏的时候再看看详细的 debug