dell r730 安装 Gp 后万兆网卡有 rx error

2016-09-20 18:59:58 +08:00
 shidifen

设备配置及操作系统

cpu :英特尔至强 E5-2640V3 处理器 2.6GHz 8 核 2 颗 mem : 8G , DDR4-2133 RDIMM , 32 条,共 256G 硬盘 1 : 1.2T ,万转 sas 做数据盘, 24 块 硬盘 2 : 600G ,万转 sas 做系统盘, 2 块 RAID 卡: 2G 缓存 网卡: 2*10GE ( SFP+),原厂的 操作系统: suse11sp4 Linux hebda_data_33 3.0.101-77-default #1 SMP Tue Jun 14 20:33:58 UTC 2016 (a082ea6) x86_64 x86_64 x86_64 GNU/Linux 上联交换机:华为 12812 网卡信息:

ethtool -i p4p2
driver: bnx2x
version: 1.710.51-0
firmware-version: FFV08.07.25 bc 7.13.54
bus-info: 0000:83:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
hebda_data_33:~ # ethtool -i em1
driver: bnx2x
version: 1.710.51-0
firmware-version: FFV08.07.25 bc 7.13.54
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

hebda_data_33:~ # lspci -s 0000:83:00.1 -vvv
83:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
        Subsystem: Broadcom Corporation Device 1006
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 60
        Region 0: Memory at c8000000 (64-bit, prefetchable) [size=8M]
        Region 2: Memory at c8800000 (64-bit, prefetchable) [size=8M]
        Region 4: Memory at ca000000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at ca500000 [disabled] [size=512K]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Not readable
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [a0] MSI-X: Enable+ Count=32 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00001000
        Capabilities: [ac] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr+ NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <2us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [13c v1] Device Serial Number f4-e9-d4-ff-fe-9d-ba-10
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [220 v1] #15
        Kernel driver in use: bnx2x
        Kernel modules: bnx2x

hebda_data_33:~ # lspci -s 0000:01:00.0 -vvv
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
        Subsystem: Dell BCM57800 10-Gigabit Ethernet
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 40
        Region 0: Memory at 95000000 (64-bit, prefetchable) [size=8M]
        Region 2: Memory at 95800000 (64-bit, prefetchable) [size=8M]
        Region 4: Memory at 96030000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at 96080000 [disabled] [size=512K]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=8 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Not readable
        Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [a0] MSI-X: Enable+ Count=32 Masked-
                Vector table: BAR=4 offset=00000000
                PBA: BAR=4 offset=00001000
        Capabilities: [ac] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr+ NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <2us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [13c v1] Device Serial Number 18-66-da-ff-fe-65-77-0b
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [1b8 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [220 v1] #15
        Capabilities: [300 v1] #19
        Kernel driver in use: bnx2x
        Kernel modules: bnx2x

hebda_data_33:~ # ethtool -S p4p2|grep dis
     [0]: rx_discards: 79516
     [0]: rx_phy_ip_err_discards: 0
     [0]: rx_skb_alloc_discard: 28517
     [1]: rx_discards: 88484
     [1]: rx_phy_ip_err_discards: 0
     [1]: rx_skb_alloc_discard: 27102
     [2]: rx_discards: 13667973
     [2]: rx_phy_ip_err_discards: 0
     [2]: rx_skb_alloc_discard: 35207
     [3]: rx_discards: 33056205
     [3]: rx_phy_ip_err_discards: 0
     [3]: rx_skb_alloc_discard: 33533
     [4]: rx_discards: 13263091
     [4]: rx_phy_ip_err_discards: 0
     [4]: rx_skb_alloc_discard: 34748
     [5]: rx_discards: 7583294
     [5]: rx_phy_ip_err_discards: 0
     [5]: rx_skb_alloc_discard: 32756
     [6]: rx_discards: 3703892
     [6]: rx_phy_ip_err_discards: 0
     [6]: rx_skb_alloc_discard: 28380
     [7]: rx_discards: 31746726
     [7]: rx_phy_ip_err_discards: 0
     [7]: rx_skb_alloc_discard: 32609
     rx_discards: 103189181
     rx_mf_tag_discard: 0
     rx_brb_discard: 90068
     rx_phy_ip_err_discards: 0
     rx_skb_alloc_discard: 252852
 没有其它错误
hebda_data_23:~ # for i in `seq 1 10`; do ifconfig p4p2 | grep RX | grep overruns; sleep 1; done
          RX packets:253639505018 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639552428 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639566818 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639585722 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639597202 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639610209 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639622800 errors:305619311 dropped:0 overruns:305375168 frame:244143
          RX packets:253639642350 errors:305620450 dropped:0 overruns:305376307 frame:244143
          RX packets:253639675509 errors:305620450 dropped:0 overruns:305376307 frame:244143
          RX packets:253639723772 errors:305620471 dropped:0 overruns:305376328 frame:244143
hebda_data_23:~ # for i in `seq 1 10`; do ifconfig p4p2 | grep RX | grep overruns; sleep 1; done
          RX packets:253639788669 errors:305620773 dropped:0 overruns:305376630 frame:244143
          RX packets:253639812355 errors:305621201 dropped:0 overruns:305377058 frame:244143
          RX packets:253639834600 errors:305621201 dropped:0 overruns:305377058 frame:244143
          RX packets:253639892990 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639913026 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639919136 errors:305621455 dropped:0 overruns:305377312 frame:244143
          RX packets:253639935095 errors:305622380 dropped:0 overruns:305378237 frame:244143
          RX packets:253639954560 errors:305623012 dropped:0 overruns:305378869 frame:244143
          RX packets:253639961150 errors:305623012 dropped:0 overruns:305378869 frame:244143
          RX packets:253639971680 errors:305623012 dropped:0 overruns:305378869 frame:244143

业务配置

Gp DB 4.3

问题描述

安装应用后网卡的使用情况如下图: 但是在高峰时通过 nagios 会发现整个集群每个节点都报下面的错误,裸跑的时候也有类似的报错,但是没有来得及抓网卡的包:

Interface 11
Active checks of the service have been disabled - only passive checks are being accepted	Perform Extra Service Actions
CRITICAL	09-20-2016 10:47:51	0d 0h 11m 46s	1/1	CRIT - [p4p2] (up) MAC: f4:e9:d4:9d:cb:92, 10.00 Gbit/s, in: 262.67 MB/s, in-errors: 0.16%(!!) >= 0.1, out: 237.76 MB/s 

实际使用的命令是:

echo '<<<lnx_if:sep(58)>>>'
sed 1,2d /proc/net/dev

整体上来看, errors 在 0.1%-0.6%之间,极少的能达到 1%,当时的流量也从 20M-200MB 左右不等。

  1. 第一个问题是:这是不是问题?我个人感觉应该是,所以个人花了精力来处理,各位大神意见?
  2. 第一个问题是:如何解决?我有一点思路,请大神拍一下。 看了网上大家写的,怀疑问题是在 rx errors ,而且我看 overrun 比较多,是否不是 ring_buffer 的问题,而是中断的问题?
5583 次点击
所在节点    问与答
25 条回复
shidifen
2016-09-20 19:01:30 +08:00
@redsonic ,我有一个类似的问题,请教一下,大神帮忙。题目是“ dell r730 安装 Gp 后万兆网卡有 rx error ”
redsonic
2016-09-21 02:16:24 +08:00
能贴一下 cpu 负载吗, QPS 是多少?
我觉得应该是 cpu 过载了吧
“当时的流量也从 20M-200MB 左右不等” 意思是每秒流量压力不是平稳的? 能不能配合流量图看一下 error 和 overrun 的关系。
shidifen
2016-09-21 10:12:39 +08:00
先感谢大神,太及时了,我同时在几个网站提的,这个最快
![cpuload]( )
![interf]( http://ooo.0o0.ooo/2016/09/21/57e1ebdc2d6b6.png)
shidifen
2016-09-21 10:14:53 +08:00
我解释一下“当时的流量也从 20M-200MB 左右不等”,流量确实是不平稳的,而且可以确认流量大时有,流量小的时候没有。
网络图在这
![interf]( )
shidifen
2016-09-21 10:33:59 +08:00
流量大时有 error ,流量小的时候没有。
redsonic
2016-09-21 13:39:17 +08:00
你有 32 个逻辑 cpu ,而且负载不高,是不是中断都绑到一个 cpu 上去了,看一下 /proc/interrupts 。 如果你开了 irqbalance 服务把它关了,然后手动绑中断到各个 cpu 。 没搞过的话可以参照 http://www.stateam.net/2016/02/292
shidifen
2016-09-21 18:01:03 +08:00
大神,我查过了,截图如下 cpu si 不集中,而且 irqbalance ,看来您也认为这是个问题。
[cpu]( )
shidifen
2016-09-21 18:01:51 +08:00
```
cat /proc/interrupts |grep p4p2|awk '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$NF-2,$NF-1,$NF}'
189: 23 1910 0 23712 0 5819 0 17237 -2 -1 p4p2
191: 3 1298600 0 4960284 0 2318276 0 39993639 -2 -1 p4p2-fp-0
192: 3 10001905 0 59523308 0 91985518 0 127790073 -2 -1 p4p2-fp-1
193: 3 11441976 0 258662301 0 388562940 0 575324868 -2 -1 p4p2-fp-2
194: 3 40000208 0 318757279 0 518502447 0 617694135 -2 -1 p4p2-fp-3
195: 3 19019621 0 454066151 0 695202386 0 554029662 -2 -1 p4p2-fp-4
196: 11 326794259 0 2075443900 0 1914823036 0 1712511864 -2 -1 p4p2-fp-5
197: 3 50776115 0 1363045548 0 533311176 0 1048504898 -2 -1 p4p2-fp-6
198: 3 36012441 0 1125578323 0 803794796 0 874562863 -2 -1 p4p2-fp-7
```
shidifen
2016-09-21 18:04:46 +08:00
不会和 bois 的设置有关系吧?移动的硬件入场那个垃圾,根本没有人懂。
redsonic
2016-09-21 18:20:22 +08:00
cpu 压力不大。不管怎样先把 irqbalance 关掉,手动绑了再看看。和硬件问题有关的一般是寨卡和国产的 SFP+模块。
另外你那 wa 为什么这么高? 跑数据库了?
shidifen
2016-09-22 09:41:25 +08:00
1. 大神,确实这个集群的机器是在跑 greenplum 的数据库,所以 wa 会有一些。
2. irqbalance 确实开着,全关了,然后被拉去开会,回来一看立时就看到 cpu si 集中到一个核上了,所以只好先打开,我看看文档争取一下把绑定全做了,幸亏开始的时候在上面部署了 salt ,不然一个人一个个做过去,死了。
3. 如何确认寨卡,我看 intel 的还有个程序可以检测,我们用的是 Broadcom BCM57810 10 Gigabit Ethernet ,供货商是个 js ,这个有办法识别是否寨卡,我看 taotao 上卖这个挺多,不敢说会不会是,光模块看了,是马来产
shidifen
2016-09-22 09:45:51 +08:00
```
cat /proc/interrupts |grep p4p2 我用 ue 看了看,还是很有规律的。
189: 23 1910 0 23712 0 5819 0 17237 0 5376 0 56987 0 99061 0 8446 0 210719 0 17656 0 9158 0 10846 0 15137 0 304803 0 390534 0 774799 IR-PCI-MSI-edge p4p2
191: 3 1790680 0 10616665 0 62155547 0 51990594 0 28726188 0 4981410 0 14502611 0 73392654 0 1514807584 0 50490160 0 36799635 0 31323666 0 34551994 0 1841649566 0 4080397994 0 645370109 IR-PCI-MSI-edge p4p2-fp-0
192: 3 10001905 0 59523308 0 91985518 0 127790073 0 226232145 0 68948743 0 53568007 0 139196477 0 2199357135 0 17363297 0 106875992 0 192090680 0 207748476 0 4157890819 0 75973239 0 522926553 IR-PCI-MSI-edge p4p2-fp-1
193: 3 17101328 0 276032420 0 406790277 0 582324878 0 341336810 0 587700803 0 273444019 0 626032634 0 1015882162 0 639888141 0 1073516044 0 592148466 0 423007262 0 89076476 0 778398606 0 869867536 IR-PCI-MSI-edge p4p2-fp-2
194: 3 40280877 0 382936413 0 525153622 0 639028845 0 463062554 0 375178493 0 522825186 0 776598788 0 146472313 0 575108019 0 646425393 0 777351438 0 755557817 0 53563221 0 480382819 0 968528415 IR-PCI-MSI-edge p4p2-fp-3
195: 3 19157407 0 462878595 0 700714489 0 578481977 0 587614373 0 603026304 0 449605352 0 601554280 0 726230782 0 605540303 0 768006014 0 539097451 0 1099203513 0 80342630 0 170116494 0 482882557 IR-PCI-MSI-edge p4p2-fp-4
196: 11 326794259 0 2075443900 0 1914823036 0 1712511864 0 1491815039 0 768009701 0 626173817 0 1033024867 0 1303540739 0 1654168784 0 1458276685 0 1366877077 0 997404728 0 3603056177 0 15650855 0 1529467882 IR-PCI-MSI-edge p4p2-fp-5
197: 3 51207795 0 1381837753 0 542946245 0 1060608865 0 701987854 0 399374560 0 335176577 0 408752021 0 125841732 0 679175184 0 724132714 0 587519037 0 446235402 0 172909262 0 164639268 0 580566984 IR-PCI-MSI-edge p4p2-fp-6
198: 3 36012441 0 1134119927 0 812983145 0 883702589 0 1064498302 0 208541493 0 415344892 0 631032811 0 75737917 0 317337699 0 688038432 0 503944441 0 698137741 0 27688040 0 313802737 0 413330289 IR-PCI-MSI-edge p4p2-fp-7
```
shidifen
2016-09-22 10:18:21 +08:00
@redsonic ,大神,我有 32 核 CPU ,但是按您给的手册中说的,只有 8 个中断队列,那只能手工绑定到 8 个 cpu ,不是不如 irqbalance ,那个至少还可以占用 16 个核?
另外, ring_buffer 是否需要调大?
Ring parameters for p4p2:
Pre-set maximums:
RX: 4078
RX Mini: 0
RX Jumbo: 0
TX: 4078
Current hardware settings:
RX: 453
RX Mini: 0
RX Jumbo: 0
TX: 4078
redsonic
2016-09-22 14:35:42 +08:00
我看你的 cpu 负载不高,所以为了排除其他因素只能停掉 irqbalance 再观察。 irqbalance 的原理是不停修改中断描述来实现平衡,并不是一种很 native 的方法,先手工绑了再看。
调整 ring_buffer 能减少突发流量的丢包,不能解决持续丢包。它其实就是一个 FIFO , O 的速度赶不上 I 的速度肯定还是会丢。
shidifen
2016-09-22 14:56:41 +08:00
@redsonic 我按您提供的手册找了两个节点,这样做的:

```
service irq_balancer stop
echo "1"> /proc/irq/191/smp_affinity
echo "2"> /proc/irq/192/smp_affinity
echo "4"> /proc/irq/193/smp_affinity
echo "8"> /proc/irq/194/smp_affinity
echo "16"> /proc/irq/195/smp_affinity
echo "20"> /proc/irq/196/smp_affinity
echo "40"> /proc/irq/197/smp_affinity
echo "80"> /proc/irq/198/smp_affinity
```
shidifen
2016-09-22 14:57:47 +08:00
做完了后, cpu si 有下降,但是 rx error 和 overrun 还在增长,我在观察。
shidifen
2016-09-22 16:26:39 +08:00
@redsonic 大神,我把 ring_buffer 也改了,有所改善,我再观察,有问题再说。
shidifen
2016-09-23 10:45:12 +08:00
@redsonic ,神啊,我看了下,确实有一点改善,但是现在还是有报错,还有什么建议么?
shidifen
2016-09-23 10:52:19 +08:00
而且做了修改的那两个节点,确实是只前 8 个核的 cpu si 有值,其它的核这个值就没有,是否能够增加队列,这样可以有更多的核参与中断?
shidifen
2016-09-23 11:23:11 +08:00
@redsonic ,能否有排除一下寨卡的方法,说实话,我一直对于这部分不太放心,别我们费了好大劲,栽这上面。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/307630

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX