pve 7.4.3
Linux pve 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) x86_64 GNU/Linux
开启了 iommu ,开启了 acs 补丁,
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init video=simplefb:off pcie_acs_override=downstream,multifunction split_lock_detect=off"
windows with desktop passthrough 了 1660s 显卡、usb 键盘鼠标、DP 显卡直出画面到显示器。
linux with desktop pasthrough 了 teslpa p4 显卡,用于解码 emby 。
windows with desktop vm 卡死,linux with desktop vm 卡死,键盘鼠标无响应, pve host 无法 ssh
每次崩溃都是类似的 trace(the traces always identical when it crashes.) 最近 2 次 lockup 的 trace 如下:
8-2 号:
Aug 2 14:37:09 pve kernel: [1260278.267429] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
Aug 2 14:37:09 pve kernel: [1260278.267433] Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs veth ebtable_filter ebtables ip_set ip6table_raw iptabl
e_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal
intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi kvm snd_hda_intel ast crct10dif_pclmul snd_intel_dspcfg ghash_clmulni_intel
snd_usb_audio drm_vram_helper snd_intel_sdw_acpi aesni_intel snd_hda_codec drm_ttm_helper snd_usbmidi_lib ttm crypto_simd snd_rawmidi snd_hda_core cryptd snd_seq_device drm_kms_helper snd_hw
dep mc cec snd_pcm rc_core rapl rndis_host snd_timer fb_sys_fops syscopyarea cdc_ether mei_me snd sysfillrect usbnet isst_if_mbox_pci isst_if_mmio sysimgblt intel_cstate mii soundcore pcspkr
joydev efi_pstore input_leds acpi_ipmi isst_if_common intel_pch_thermal mei ioatdma
Aug 2 14:37:09 pve kernel: [1260278.267474] ipmi_si ipmi_devintf zfs(PO) ipmi_msghandler acpi_power_meter acpi_pad zunicode(PO) zzstd(O) mac_hid zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpai
r(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_
type1 vfio drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio usbkbd libcrc32c usbmouse hi
d_generic usbhid hid crc32_pclmul nvme xhci_pci igb i2c_i801 xhci_pci_renesas nvme_core i2c_algo_bit i2c_smbus ahci dca libahci xhci_hcd intel_pmt wmi
Aug 2 14:37:09 pve kernel: [1260278.267507] CPU: 3 PID: 4137035 Comm: CPU 11/KVM Tainted: P W O 5.15.102-1-pve #1
Aug 2 14:37:09 pve kernel: [1260278.267510] Hardware name: Supermicro X12DAi-N6/X12DAi-N6, BIOS 1.1b 09/10/2021
Aug 2 14:37:09 pve kernel: [1260278.267511] RIP: 0010:_raw_spin_lock+0x0/0x30
Aug 2 14:37:09 pve kernel: [1260278.267516] Code: 00 f0 0f b1 17 75 05 c3 cc cc cc cc 55 89 c6 48 89 e5 e8 43 5b 39 ff 66 90 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00
00 31 c0 ba 01 00 00 00 f0 0f b1 17 75 05 c3 cc cc cc
Aug 2 14:37:09 pve kernel: [1260278.267518] RSP: 0018:ff724a9600540d18 EFLAGS: 00000046
Aug 2 14:37:09 pve kernel: [1260278.267520] RAX: ff724a96000c5000 RBX: 0000000000000004 RCX: ff22181d4004b400
Aug 2 14:37:09 pve kernel: [1260278.267521] RDX: ff22181d4004b400 RSI: 0000000000000000 RDI: ff22181d4020dcc0
Aug 2 14:37:09 pve kernel: [1260278.267522] RBP: ff724a9600540dc8 R08: 00000000000003ac R09: ff22181d4020dcc0
Aug 2 14:37:09 pve kernel: [1260278.267523] R10: 0000000000000010 R11: 0000000000000004 R12: 00000000000003ac
Aug 2 14:37:09 pve kernel: [1260278.267524] R13: 0000000000000000 R14: ff22181d401d4e00 R15: ff22181d4020dcc0
Aug 2 14:37:09 pve kernel: [1260278.267525] FS: 00007fa3dbdff700(0000) GS:ff22185bbf2c0000(0000) knlGS:ffffd4815cb15000
Aug 2 14:37:09 pve kernel: [1260278.267527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 2 14:37:09 pve kernel: [1260278.267528] CR2: 00001bbd00881000 CR3: 00000039b81ec004 CR4: 0000000000773ee0
Aug 2 14:37:09 pve kernel: [1260278.267529] PKRU: 55555554
Aug 2 14:37:09 pve kernel: [1260278.267530] Call Trace:
Aug 2 14:37:09 pve kernel: [1260278.267532] <IRQ>
Aug 2 14:37:09 pve kernel: [1260278.267532] ? qi_submit_sync+0x328/0x5c0
Aug 2 14:37:09 pve kernel: [1260278.267537] qi_flush_iotlb+0x84/0xa0
Aug 2 14:37:09 pve kernel: [1260278.267539] intel_flush_iotlb_all+0x59/0x160
Aug 2 14:37:09 pve kernel: [1260278.267541] iommu_dma_flush_iotlb_all+0x1a/0x30
Aug 2 14:37:09 pve kernel: [1260278.267544] iova_domain_flush+0x1b/0x30
Aug 2 14:37:09 pve kernel: [1260278.267546] fq_flush_timeout+0x39/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267547] ? fq_ring_free+0x170/0x170
Aug 2 14:37:09 pve kernel: [1260278.267549] call_timer_fn+0x29/0x120
Aug 2 14:37:09 pve kernel: [1260278.267554] __run_timers.part.0+0x1e1/0x270
Aug 2 14:37:09 pve kernel: [1260278.267555] ? ktime_get+0x43/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267557] ? lapic_next_deadline+0x2c/0x40
Aug 2 14:37:09 pve kernel: [1260278.267561] ? clockevents_program_event+0xa8/0x130
Aug 2 14:37:09 pve kernel: [1260278.267564] run_timer_softirq+0x2a/0x60
Aug 2 14:37:09 pve kernel: [1260278.267565] __do_softirq+0xd6/0x2ea
Aug 2 14:37:09 pve kernel: [1260278.267568] irq_exit_rcu+0x94/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267570] sysvec_apic_timer_interrupt+0x80/0x90
Aug 2 14:37:09 pve kernel: [1260278.267574] </IRQ>
Aug 2 14:37:09 pve kernel: [1260278.267575] <TASK>
Aug 2 14:37:09 pve kernel: [1260278.267575] asm_sysvec_apic_timer_interrupt+0x1b/0x20
Aug 2 14:37:09 pve kernel: [1260278.267577] RIP: 0010:vmx_do_interrupt_nmi_irqoff+0x10/0x20 [kvm_intel]
Aug 2 14:37:09 pve kernel: [1260278.267590] Code: 41 5b 41 5a 41 59 41 58 5e 5f 5a 59 58 5d e9 47 da c7 dc 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 e4 f0 6a 18 55 9c 6a 10 ff d7 <0f> 1f 00 48
89 ec 5d e9 24 da c7 dc 0f 1f 40 00 0f 1f 44 00 00 55
Aug 2 14:37:09 pve kernel: [1260278.267591] RSP: 0018:ff724a9606cefcd8 EFLAGS: 00000082
Aug 2 14:37:09 pve kernel: [1260278.267593] RAX: 0000000000000e30 RBX: ff22181ef2ce8000 RCX: 0000000000000000
Aug 2 14:37:09 pve kernel: [1260278.267594] RDX: ffffffff00000000 RSI: 0001000000000000 RDI: ffffffff9e000e30
Aug 2 14:37:09 pve kernel: [1260278.267595] RBP: ff724a9606cefcd8 R08: 000006901fa2c781 R09: 0000000000000000
Aug 2 14:37:09 pve kernel: [1260278.267595] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000800000ec
Aug 2 14:37:09 pve kernel: [1260278.267596] R13: 0000000000000000 R14: ff724a960333fb48 R15: ff22181ef2ce8038
Aug 2 14:37:09 pve kernel: [1260278.267598] ? asm_sysvec_spurious_apic_interrupt+0x20/0x20
Aug 2 14:37:09 pve kernel: [1260278.267601] vmx_handle_exit_irqoff+0x175/0x2e0 [kvm_intel]
Aug 2 14:37:09 pve kernel: [1260278.267608] kvm_arch_vcpu_ioctl_run+0xd19/0x1730 [kvm]
Aug 2 14:37:09 pve kernel: [1260278.267658] ? kvm_arch_vcpu_ioctl_run+0x712/0x1730 [kvm]
Aug 2 14:37:09 pve kernel: [1260278.267695] ? __wake_up_locked_key+0x1b/0x30
Aug 2 14:37:09 pve kernel: [1260278.267698] kvm_vcpu_ioctl+0x252/0x6b0 [kvm]
Aug 2 14:37:09 pve kernel: [1260278.267725] ? kvm_vcpu_ioctl+0x2bb/0x6b0 [kvm]
Aug 2 14:37:09 pve kernel: [1260278.267752] ? vfs_write+0xc8/0x270
Aug 2 14:37:09 pve kernel: [1260278.267755] ? __fget_files+0x86/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267758] __x64_sys_ioctl+0x92/0xd0
Aug 2 14:37:09 pve kernel: [1260278.267761] do_syscall_64+0x59/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267763] ? do_syscall_64+0x69/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267765] ? do_syscall_64+0x69/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267766] ? syscall_exit_to_user_mode+0x27/0x50
Aug 2 14:37:09 pve kernel: [1260278.267768] ? do_syscall_64+0x69/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267770] ? do_syscall_64+0x69/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267771] ? do_syscall_64+0x69/0xc0
Aug 2 14:37:09 pve kernel: [1260278.267773] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 2 14:37:09 pve kernel: [1260278.267775] RIP: 0033:0x7fb41c025237
Aug 2 14:37:09 pve kernel: [1260278.267777] Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
Aug 2 14:37:09 pve kernel: [1260278.267778] RSP: 002b:00007fa3dbdfa288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 2 14:37:09 pve kernel: [1260278.267780] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fb41c025237
Aug 2 14:37:09 pve kernel: [1260278.267781] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000024
Aug 2 14:37:09 pve kernel: [1260278.267782] RBP: 0000561590b8d2d0 R08: 000056158f065240 R09: 00000000ffffffff
Aug 2 14:37:09 pve kernel: [1260278.267783] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Aug 2 14:37:09 pve kernel: [1260278.267783] R13: 000056158f770020 R14: 0000000000000000 R15: 0000000000000000
Aug 2 14:37:09 pve kernel: [1260278.267785] </TASK>
Aug 2 14:37:09 pve kernel: [1260295.454974] watchdog: BUG: soft lockup - CPU#11 stuck for 26s! [kworker/11:2:4149114]
Aug 2 14:37:09 pve kernel: [1260295.455414] Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs veth ebtable_filter ebtables ip_set ip6table_raw iptabl
e_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal
intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi kvm snd_hda_intel ast crct10dif_pclmul snd_intel_dspcfg ghash_clmulni_intel
snd_usb_audio drm_vram_helper snd_intel_sdw_acpi aesni_intel snd_hda_codec drm_ttm_helper snd_usbmidi_lib ttm crypto_simd snd_rawmidi snd_hda_core cryptd snd_seq_device drm_kms_helper snd_hw
dep mc cec snd_pcm rc_core rapl rndis_host snd_timer fb_sys_fops syscopyarea cdc_ether mei_me snd sysfillrect usbnet isst_if_mbox_pci isst_if_mmio sysimgblt intel_cstate mii soundcore pcspkr
joydev efi_pstore input_leds acpi_ipmi isst_if_common intel_pch_thermal mei ioatdma
Aug 2 14:37:09 pve kernel: [1260295.455479] ipmi_si ipmi_devintf zfs(PO) ipmi_msghandler acpi_power_meter acpi_pad zunicode(PO) zzstd(O) mac_hid zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpai
r(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_
type1 vfio drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio usbkbd libcrc32c usbmouse hi
d_generic usbhid hid crc32_pclmul nvme xhci_pci igb i2c_i801 xhci_pci_renesas nvme_core i2c_algo_bit i2c_smbus ahci dca libahci xhci_hcd intel_pmt wmi
Aug 2 14:37:09 pve kernel: [1260295.455532] CPU: 11 PID: 4149114 Comm: kworker/11:2 Tainted: P W O 5.15.102-1-pve #1
Aug 2 14:37:09 pve kernel: [1260295.455536] Hardware name: Supermicro X12DAi-N6/X12DAi-N6, BIOS 1.1b 09/10/2021
Aug 2 14:37:09 pve kernel: [1260295.455537] Workqueue: rcu_par_gp sync_rcu_exp_select_node_cpus
Aug 2 14:37:09 pve kernel: [1260295.455543] RIP: 0010:smp_call_function_single+0x94/0x130
Aug 2 14:37:09 pve kernel: [1260295.455547] Code: 32 c9 62 a9 00 01 ff 00 0f 85 9e 00 00 00 85 c9 75 4c 48 c7 c6 80 1b 03 00 65 48 03 35 f5 cb c8 62 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8
最近一次 8-19:
Aug 19 09:28:29 pve kernel: [777177.076338] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32
Aug 19 09:28:29 pve kernel: [777177.076340] Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 cifs_md4 fscache net
fs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bondin
g tls softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powercl
amp coretemp kvm_intel kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd snd_hda_codec_realtek snd_hda_codec_
generic ledtrig_audio snd_hda_codec_hdmi ast drm_vram_helper drm_ttm_helper snd_hda_intel snd_usb_audio ttm snd_intel_dspcfg snd_us
bmidi_lib snd_intel_sdw_acpi drm_kms_helper snd_rawmidi snd_hda_codec snd_seq_device snd_hda_core cec snd_hwdep mc rc_core zfs(PO)
snd_pcm rndis_host fb_sys_fops snd_timer rapl cdc_ether syscopyarea mei_me zunicode(PO) snd sysfillrect usbnet isst_if_mbox_pci iss
t_if_mmio sysimgblt intel_cstate isst_if_common mii soundcore efi_pstore pcspkr joydev ioatdma intel_pch_thermal mei
Aug 19 09:28:29 pve kernel: [777177.076381] input_leds zzstd(O) zlua(O) acpi_ipmi zavl(PO) ipmi_si icp(PO) ipmi_devintf ipmi_msgha
ndler acpi_power_meter acpi_pad zcommon(PO) mac_hid znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio drm
sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbmouse usbkbd dm_thin_po
ol dm_persistent_data dm_bio_prison dm_bufio libcrc32c usbhid hid crc32_pclmul nvme xhci_pci i2c_i801 xhci_pci_renesas igb nvme_cor
e i2c_smbus i2c_algo_bit ahci dca xhci_hcd libahci intel_pmt wmi
Aug 19 09:28:29 pve kernel: [777177.076415] CPU: 32 PID: 0 Comm: swapper/32 Tainted: P W O 5.15.102-1-pve #1
Aug 19 09:28:29 pve kernel: [777177.076417] Hardware name: Supermicro X12DAi-N6/X12DAi-N6, BIOS 1.1b 09/10/2021
Aug 19 09:28:29 pve kernel: [777177.076418] RIP: 0010:qi_submit_sync+0x2db/0x5c0
Aug 19 09:28:29 pve kernel: [777177.076424] Code: 4d 8b 8e 10 01 00 00 31 db 41 f6 46 25 08 0f 95 c3 49 8b 41 10 83 c3 04 42 83 3c
20 03 0f 84 a3 01 00 00 49 8b 06 44 8b 68 34 <41> f6 c5 70 0f 85 5c 01 00 00 41 f6 c5 10 74 18 49 8b 06 8b 80 80
Aug 19 09:28:29 pve kernel: [777177.076425] RSP: 0018:ff594f7a00b24d20 EFLAGS: 00000093
Aug 19 09:28:29 pve kernel: [777177.076427] RAX: ff594f7a000c5000 RBX: 0000000000000004 RCX: ff3e37f40004b400
Aug 19 09:28:29 pve kernel: [777177.076428] RDX: ff3e37f40004b400 RSI: 0000000000000000 RDI: ff3e37f40020d1c0
Aug 19 09:28:29 pve kernel: [777177.076429] RBP: ff594f7a00b24dc8 R08: 000000000000014c R09: ff3e37f40020d1c0
Aug 19 09:28:29 pve kernel: [777177.076430] R10: 0000000000000010 R11: 0000000000000004 R12: 000000000000014c
Aug 19 09:28:29 pve kernel: [777177.076431] R13: 0000000000000000 R14: ff3e37f4001d4000 R15: ff3e37f40020d1c0
Aug 19 09:28:29 pve kernel: [777177.076432] FS: 0000000000000000(0000) GS:ff3e38327fa00000(0000) knlGS:0000000000000000
Aug 19 09:28:29 pve kernel: [777177.076433] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 19 09:28:29 pve kernel: [777177.076434] CR2: 000000001f1a5080 CR3: 0000000128eb8002 CR4: 0000000000773ee0
Aug 19 09:28:29 pve kernel: [777177.076436] PKRU: 55555554
Aug 19 09:28:29 pve kernel: [777177.076436] Call Trace:
Aug 19 09:28:29 pve kernel: [777177.076437] <IRQ>
Aug 19 09:28:29 pve kernel: [777177.076438] ? enqueue_entity+0x17d/0x760
Aug 19 09:28:29 pve kernel: [777177.076446] qi_flush_iotlb+0x84/0xa0
Aug 19 09:28:29 pve kernel: [777177.076447] intel_flush_iotlb_all+0x59/0x160
Aug 19 09:28:29 pve kernel: [777177.076450] iommu_dma_flush_iotlb_all+0x1a/0x30
Aug 19 09:28:29 pve kernel: [777177.076452] iova_domain_flush+0x1b/0x30
Aug 19 09:28:29 pve kernel: [777177.076454] fq_flush_timeout+0x39/0xc0
Aug 19 09:28:29 pve kernel: [777177.076456] ? fq_ring_free+0x170/0x170
Aug 19 09:28:29 pve kernel: [777177.076458] call_timer_fn+0x29/0x120
Aug 19 09:28:29 pve kernel: [777177.076462] __run_timers.part.0+0x1e1/0x270
Aug 19 09:28:29 pve kernel: [777177.076463] ? ktime_get+0x43/0xc0
Aug 19 09:28:29 pve kernel: [777177.076465] ? lapic_next_deadline+0x2c/0x40
Aug 19 09:28:29 pve kernel: [777177.076469] ? clockevents_program_event+0xa8/0x130
Aug 19 09:28:29 pve kernel: [777177.076473] run_timer_softirq+0x2a/0x60
Aug 19 09:28:29 pve kernel: [777177.076474] __do_softirq+0xd6/0x2ea
Aug 19 09:28:29 pve kernel: [777177.076478] irq_exit_rcu+0x94/0xc0
Aug 19 09:28:29 pve kernel: [777177.076480] sysvec_apic_timer_interrupt+0x80/0x90
Aug 19 09:28:29 pve kernel: [777177.076483] </IRQ>
Aug 19 09:28:29 pve kernel: [777177.076484] <TASK>
Aug 19 09:28:29 pve kernel: [777177.076484] asm_sysvec_apic_timer_interrupt+0x1b/0x20
Aug 19 09:28:29 pve kernel: [777177.076486] RIP: 0010:cpuidle_enter_state+0xd9/0x620
Aug 19 09:28:29 pve kernel: [777177.076491] Code: 3d 04 78 5e 7c e8 37 36 6d ff 49 89 c7 0f 1f 44 00 00 31 ff e8 78 43 6d ff 80 7d
d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Aug 19 09:28:29 pve kernel: [777177.076492] RSP: 0018:ff594f7a003a7e38 EFLAGS: 00000246
Aug 19 09:28:29 pve kernel: [777177.076493] RAX: ff3e38327fa30bc0 RBX: ff8b4f79fa637d00 RCX: 0000000000000000
Aug 19 09:28:29 pve kernel: [777177.076494] RDX: 0000000000016176 RSI: 00000000471c676c RDI: 0000000000000000
Aug 19 09:28:29 pve kernel: [777177.076495] RBP: ff594f7a003a7e88 R08: 0002c2d2999fe5b0 R09: 00000000000927c0
Aug 19 09:28:29 pve kernel: [777177.076496] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffff84ed4ca0
Aug 19 09:28:29 pve kernel: [777177.076497] R13: 0000000000000002 R14: 0000000000000002 R15: 0002c2d2999fe5b0
Aug 19 09:28:29 pve kernel: [777177.076499] ? cpuidle_enter_state+0xc8/0x620
Aug 19 09:28:29 pve kernel: [777177.076502] cpuidle_enter+0x2e/0x50
可以看到,2 次都和qi_submit_sync 、iommu_dma_flus 、fq_flush_timeout有关,而这 几个都和 iommu 有关,所以我猜测的是和 iommu 开启有关。
https://lists.linuxfoundation.org/pipermail/iommu/2015-January/011506.html 在网上找到一篇老文章提到了 qi_submit_sync()函数,文章里面提到让禁用 CONFIG_NET_DMA ,但是如何禁用不知道如何操作,文章的后面部分看不懂了,
https://bbs.archlinux.org/viewtopic.php?id=284548 里面提到设置 CPU 关联,以便至少 1 个内核永远不会用于 USB 串行通信,但是也有人回帖说对于他不起作用; adomino-engineer 说这个错误很可能是在内核版本 5.11 和 5.13 之间引入的 里面最后给了一个最终补丁,但是我也没看明白啥意思
https://lore.kernel.org/lkml/65da1862-364b-9500-4be7-a463a12e6a7f@bytedance.com/T/ 里面提到 fq_flush_timeout ,虽然是 soft lockup ,但是我之前也出现过 soft lockup ,但是它这里面讲的完全不明白
在 pve 相关论坛有人提到是内存问题
下面几个方案是准备弄的方案,但是由于复现比较困难,而且机器要用,没有备用机,所以一直没弄,只能说有意往这几个方向尝试。
感觉这个是最可能发现问题的
256G 内存,测试时间比较久。
参考
Kernel v5.4, v5.5 and v5.6 have no lockups, while kernel v5.7, v5.8, v5.9, v5.10, v5.11, v5.12, v5.13, and 5.14 result in an immediate freeze.
其实我很想解决这个问题,也很想给解决这个问题的人 2 百块红包或者 USDT 也行,但是本着 Linux Open Free Share 的精神,如果你不要,我也没办法。 以上开个玩笑,这个问题困扰我很久了,一直没解决,希望有懂内核的师傅能帮忙解决,可以付费,但前提是问题确实是你说的问题并且最后能解决。
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.