dell r730 安装 Gp 后万兆网卡有 rx error

2016-09-20 18:59:58 +08:00
 shidifen

设备配置及操作系统

cpu :英特尔至强 E5-2640V3 处理器 2.6GHz 8 核 2 颗 mem : 8G , DDR4-2133 RDIMM , 32 条,共 256G 硬盘 1 : 1.2T ,万转 sas 做数据盘, 24 块 硬盘 2 : 600G ,万转 sas 做系统盘, 2 块 RAID 卡: 2G 缓存 网卡: 2*10GE ( SFP+),原厂的 操作系统: suse11sp4 Linux hebda_data_33 3.0.101-77-default #1 SMP Tue Jun 14 20:33:58 UTC 2016 (a082ea6) x86_64 x86_64 x86_64 GNU/Linux 上联交换机:华为 12812 网卡信息:

ethtool -i p4p2
driver: bnx2x
version: 1.710.51-0
firmware-version: FFV08.07.25 bc 7.13.54
bus-info: 0000:83:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
hebda_data_33:~ # ethtool -i em1
driver: bnx2x
version: 1.710.51-0
firmware-version: FFV08.07.25 bc 7.13.54
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

hebda_data_33:~ # lspci -s 0000:83:00.1 -vvv
83:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
        Subsystem: Broadcom Corporation Device 1006
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 60
        Region 0: Memory at c8000000 (64-bit, prefetchable) [size=8M]
        Region 2: Memory at c8800000 (64-bit, prefetchable) [size=8M]
        Region 4: Memory at ca000000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at ca500000 [disabled] [size=512K]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Not readable
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [a0] MSI-X: Enable+ Count=32 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00001000
        Capabilities: [ac] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr+ NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <2us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [13c v1] Device Serial Number f4-e9-d4-ff-fe-9d-ba-10
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [220 v1] #15
        Kernel driver in use: bnx2x
        Kernel modules: bnx2x

hebda_data_33:~ # lspci -s 0000:01:00.0 -vvv
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
        Subsystem: Dell BCM57800 10-Gigabit Ethernet
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 40
        Region 0: Memory at 95000000 (64-bit, prefetchable) [size=8M]
        Region 2: Memory at 95800000 (64-bit, prefetchable) [size=8M]
        Region 4: Memory at 96030000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at 96080000 [disabled] [size=512K]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=8 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Not readable
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [a0] MSI-X: Enable+ Count=32 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00001000
        Capabilities: [ac] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr+ NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <2us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [13c v1] Device Serial Number 18-66-da-ff-fe-65-77-0b
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [220 v1] #15
        Capabilities: [300 v1] #19
        Kernel driver in use: bnx2x
        Kernel modules: bnx2x

hebda_data_33:~ # ethtool -S p4p2|grep dis
     [0]: rx_discards: 79516
     [0]: rx_phy_ip_err_discards: 0
     [0]: rx_skb_alloc_discard: 28517
     [1]: rx_discards: 88484
     [1]: rx_phy_ip_err_discards: 0
     [1]: rx_skb_alloc_discard: 27102
     [2]: rx_discards: 13667973
     [2]: rx_phy_ip_err_discards: 0
     [2]: rx_skb_alloc_discard: 35207
     [3]: rx_discards: 33056205
     [3]: rx_phy_ip_err_discards: 0
     [3]: rx_skb_alloc_discard: 33533
     [4]: rx_discards: 13263091
     [4]: rx_phy_ip_err_discards: 0
     [4]: rx_skb_alloc_discard: 34748
     [5]: rx_discards: 7583294
     [5]: rx_phy_ip_err_discards: 0
     [5]: rx_skb_alloc_discard: 32756
     [6]: rx_discards: 3703892
     [6]: rx_phy_ip_err_discards: 0
     [6]: rx_skb_alloc_discard: 28380
     [7]: rx_discards: 31746726
     [7]: rx_phy_ip_err_discards: 0
     [7]: rx_skb_alloc_discard: 32609
     rx_discards: 103189181
     rx_mf_tag_discard: 0
     rx_brb_discard: 90068
     rx_phy_ip_err_discards: 0
     rx_skb_alloc_discard: 252852
 没有其它错误
hebda_data_23:~ # for i in `seq 1 10`; do ifconfig p4p2 | grep RX | grep overruns; sleep 1; done
          RX packets:253639505018 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639552428 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639566818 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639585722 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639597202 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639610209 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639622800 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639642350 errors:305620450 dropped:0 overruns:305376307 frame:244143
          RX packets:253639675509 errors:305620450 dropped:0 overruns:305376307 frame:244143
          RX packets:253639723772 errors:305620471 dropped:0 overruns:305376328 frame:244143
hebda_data_23:~ # for i in `seq 1 10`; do ifconfig p4p2 | grep RX | grep overruns; sleep 1; done
          RX packets:253639788669 errors:305620773 dropped:0 overruns:305376630 frame:244143
          RX packets:253639812355 errors:305621201 dropped:0 overruns:305377058 frame:244143
          RX packets:253639834600 errors:305621201 dropped:0 overruns:305377058 frame:244143
          RX packets:253639892990 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639913026 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639919136 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639935095 errors:305622380 dropped:0 overruns:305378237 frame:244143
          RX packets:253639954560 errors:305623012 dropped:0 overruns:305378869 frame:244143
          RX packets:253639961150 errors:305623012 dropped:0 overruns:305378869 frame:244143
          RX packets:253639971680 errors:305623012 dropped:0 overruns:305378869 frame:244143

业务配置

Gp DB 4.3

问题描述

安装应用后网卡的使用情况如下图: 但是在高峰时通过 nagios 会发现整个集群每个节点都报下面的错误,裸跑的时候也有类似的报错,但是没有来得及抓网卡的包:

Interface 11
Active checks of the service have been disabled - only passive checks are being accepted	Perform Extra Service Actions
CRITICAL	09-20-2016 10:47:51	0d 0h 11m 46s	1/1	CRIT - [p4p2] (up) MAC: f4:e9:d4:9d:cb:92, 10.00 Gbit/s, in: 262.67 MB/s, in-errors: 0.16%(!!) >= 0.1, out: 237.76 MB/s 

实际使用的命令是:

echo '<<<lnx_if:sep(58)>>>'
sed 1,2d /proc/net/dev

整体上来看, errors 在 0.1%-0.6%之间,极少的能达到 1%,当时的流量也从 20M-200MB 左右不等。

  1. 第一个问题是:这是不是问题?我个人感觉应该是,所以个人花了精力来处理,各位大神意见?
  2. 第一个问题是:如何解决?我有一点思路,请大神拍一下。 看了网上大家写的,怀疑问题是在 rx errors ,而且我看 overrun 比较多,是否不是 ring_buffer 的问题,而是中断的问题?
5583 次点击
所在节点    问与答
25 条回复
shidifen
2016-09-23 14:19:44 +08:00
@redsonic 现在整个库有点慢,我不知道是否和我修改那个参数有关系,想问一下,如果想回退手工绑定的方式,直接把 irqbalance 开起来就可以了吧,因为没有看到别的地方有说这个,如果真是改了参数,但是传输速率下来了,那还得改回去。
另外,有一些文档提到这个,有什么建议么?
echo fffffe > /sys/class/net/em1/queues/rx-0/rps_cpus echo fffffe > /sys/class/net/em1/queues/rx-1/rps_cpus echo fffffe > /sys/class/net/em1/queues/rx-2/rps_cpus echo fffffe > /sys/class/net/em1/queues/rx-3/rps_cpus echo 4096 > /sys/class/net/em1/queues/rx-0/rps_flow_cnt echo 4096 > /sys/class/net/em1/queues/rx-1/rps_flow_cnt echo 4096 > /sys/class/net/em1/queues/rx-2/rps_flow_cnt echo 4096 > /sys/class/net/em1/queues/rx-3/rps_flow_cnt echo 4096 > /sys/class/net/em1/queues/rx-4/rps_flow_cnt echo 20480 > /proc/sys/net/core/rps_sock_flow_entries
对于 2 个物理 cpu,8 核的机器为 ff ,具体计算方法是第一颗 cpu 是 00000001 ,第二个 cpu 是 00000010 ,第 3 个 cpu 是 00000100 ,依次类推,由于是所有的 cpu 都负担,所以所有的 cpu 数值相加,得到的数值为 11111111 ,十六进制就刚好是 ff 。而对于 /proc/sys/net/core/rps_sock_flow_entries 的数值是根据你的
tob_id_2536
网卡多少个通道,计算得出的数据,例如你是 8 通道的网卡,那么 1 个网卡,每个通道设置 4096 的数值, 8*4096 就是
/proc/sys/net/core/rps_sock_flow_entries 的数值 中断合并 ethtool -c em1
redsonic
2016-09-23 14:32:08 +08:00
队列数由硬件决定,更高端的我也只用过 intel 的 82599 , x540 。寨卡用人眼去看确实很难判别,除了用 ethtool 看 rom 里的信息恐怕只有询报价了,原厂 OEM 的比寨卡贵一倍以上。

像 RPS , irqbalance 这样的软件方案对于服务器采集级别的应用没什么正面作用。网上那些都是针对单路工作站跑千兆或单队列网卡而言的。

若要恢复就把 irqbalance 开启。 可以试试把其他重任务绑定到靠后面的 cpu ,网卡用的那 8 个别被其他任务调度。
shidifen
2016-09-26 15:29:10 +08:00
寨卡的问题,可能是解决了。我们从 dell 要了主机和网卡 ppid 的对应关系,可以与现网的设备进行核对。
shidifen
2016-11-18 18:30:00 +08:00
忘记说了,其实我们使用 iperf 测试过网卡的,带宽完全没有问题,只是测试的时候也有错包。
shidifen
2016-11-18 18:30:28 +08:00
准备换一种方法再测试一次。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/307630

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX