pve 网卡直通问题求解

2022-12-22 15:39:30 +08:00
 CRUD

主板是精粤 b660i ,一个 8125 2.5G 网口和一个 8111 千兆网口。配置完直通后通过 web 添加 pci 将 8111 直通给 openwrt ,然后 openwrt 虚拟机启动时报错:

kvm: ../hw/pci/pci.c:1562: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1

pve syslog:

Dec 22 15:15:13 aio pvedaemon[1399]: <root@pam> starting task UPID:aio:00006CA9:0005885C:63A40401:qmstart:100:root@pam:
Dec 22 15:15:13 aio pvedaemon[27817]: start VM 100: UPID:aio:00006CA9:0005885C:63A40401:qmstart:100:root@pam:
Dec 22 15:15:13 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:13 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:14 aio systemd[1]: Started 100.scope.
Dec 22 15:15:14 aio systemd-udevd[27834]: Using default interface naming scheme 'v247'.
Dec 22 15:15:14 aio systemd-udevd[27834]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 22 15:15:15 aio kernel: device tap100i0 entered promiscuous mode
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Dec 22 15:15:15 aio kernel: device fwln100i0 left promiscuous mode
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Dec 22 15:15:15 aio kernel: device fwpr100p0 left promiscuous mode
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Dec 22 15:15:15 aio systemd-udevd[27834]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 22 15:15:15 aio systemd-udevd[27834]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 22 15:15:15 aio systemd-udevd[27837]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec 22 15:15:15 aio systemd-udevd[27837]: Using default interface naming scheme 'v247'.
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Dec 22 15:15:15 aio kernel: device fwpr100p0 entered promiscuous mode
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered blocking state
Dec 22 15:15:15 aio kernel: vmbr0: port 2(fwpr100p0) entered forwarding state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Dec 22 15:15:15 aio kernel: device fwln100i0 entered promiscuous mode
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered blocking state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 1(fwln100i0) entered forwarding state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 2(tap100i0) entered blocking state
Dec 22 15:15:15 aio kernel: fwbr100i0: port 2(tap100i0) entered forwarding state
Dec 22 15:15:15 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:15 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:15 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: vfio_cap_init: hiding cap 0xff@0xff
Dec 22 15:15:16 aio kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Dec 22 15:15:16 aio kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Dec 22 15:15:16 aio pvedaemon[1400]: VM 100 qmp command failed - VM 100 not running
Dec 22 15:15:16 aio pvestatd[1370]: VM 100 qmp command failed - VM 100 not running
Dec 22 15:15:16 aio kernel: vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 22 15:15:16 aio pvedaemon[27817]: start failed: QEMU exited with code 1
Dec 22 15:15:16 aio pvedaemon[1399]: <root@pam> end task UPID:aio:00006CA9:0005885C:63A40401:qmstart:100:root@pam: start failed: QEMU exited with code 1

0000:04:00是千兆网口,0000:05:00是无线网卡,删除掉千兆网口的 PCI 只保留无线网卡是可以正常启动的。

grub:

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream,multifunction"
GRUB_CMDLINE_LINUX=""

vm config:

balloon: 512
boot: order=scsi0
cores: 4
hostpci0: 0000:04:00
hostpci1: 0000:05:00
memory: 2048
meta: creation-qemu=7.1.0,ctime=1671628207
name: openwrt
net0: virtio=6E:35:49:91:0D:46,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=420M
scsihw: virtio-scsi-single
smbios1: uuid=a3c5ba9b-a5f7-4f32-b9e3-056f0901cc01
sockets: 1
vmgenid: 1aa8b1ee-37e5-4617-9f93-fd2c8fbf98ab
3269 次点击
所在节点    问与答
16 条回复
WinkeyLin
2022-12-22 15:47:19 +08:00
这个网口是不是被 pve 占用做管理网口了
CRUD
2022-12-22 15:48:18 +08:00
@WinkeyLin 没有,pve 的网口在 2.5G 口上, 想要直通的是千兆口
onetown
2022-12-22 16:04:47 +08:00
试试 启动参数里加上 intel_iommu=on iommu=pt
onetown
2022-12-22 16:06:35 +08:00
另外, 如果不是板载的网卡的话, 可以试试换个 pci 插槽, 不过你这个估计是板载的。

这个报错其实是 kvm 预防你的 pci 设备直接访问主机内存, 主要是不在一个 iommo group 里的话就会 crash.
CRUD
2022-12-22 17:20:30 +08:00
@onetown 也是不行的老哥,一样的错误信息,网卡是板载的。看 syslog 里面相对比较明确的提示是`vfio-pci 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible`,但是按这个提示没找到什么有用的线索。
onetown
2022-12-22 22:03:15 +08:00
@CRUD 嗯, 看日志是 vfio 驱动的问题, 如果你不是一定要上 dpdk 的话, 可以考虑把 vfio 屏蔽掉试试

内核在加载驱动的时候直接使用了 vfio 驱动, 你的网卡应该不是 100% vfio 兼容的.

在 /etc/modprobe.d/blacklist.conf 里追加一行
blacklist vfio-pci
waltcow
2022-12-31 21:07:25 +08:00
@CRUD 老哥解决了没有,同一张主板,pve vm 搞 openwrt 直通 PCI 网卡遇到一样的问题。
CRUD
2023-01-03 09:11:59 +08:00
@waltcow 没有,我现在没走直通了,虚拟化方案先用着,搞不太定。
waltcow
2023-01-03 09:34:06 +08:00
@CRUD 折腾了下用 LXC 容器跑 Openwrt, 不搞 PCI 直通貌似也可以
CRUD
2023-01-03 09:38:54 +08:00
@waltcow 对,不搞直通也能用,所以就先用着了,虽然还是想能直通好一点,但是有点难搞。
moli777
2023-03-18 18:19:38 +08:00
@CRUD 买了个 760i 一样的问题,楼主后续有再尝试吗
CRUD
2023-03-20 09:05:44 +08:00
@moli777 没有了,目前没走直通。
moli777
2023-03-26 15:22:57 +08:00
@CRUD 嗯,我也放弃了,直接虚拟网桥还省事😂
CRUD
2023-03-27 11:10:11 +08:00
@moli777 是的,触碰不到带宽瓶颈,省事就好
heider
2023-11-23 10:51:32 +08:00
我是 770i 遇到了同样的问题

看文章
https://docs.opennebula.io/6.4/open_cluster_deployment/kvm_node/pci_passthrough.html

VFIO 设备绑定章节好像可以解决直通问题,抽空试一下
s1e42NxZVE484pwH
207 天前
@heider #15 成功了吗?

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/904137

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX