mongodb 频繁异常退出 errno:24 Too many open files 求助

2019-08-02 10:35:43 +08:00
comwrg  comwrg


2019-08-01T23:59:02.301+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:02.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:03.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:03.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:04.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:04.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:05.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:05.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:06.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:06.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:07.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:07.302+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:08.302+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:08.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:09.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:09.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:10.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:10.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:11.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:11.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:12.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:12.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:13.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:13.303+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:14.303+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:14.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:15.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:15.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:16.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:16.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:17.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:17.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:18.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:18.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:19.295+0800 W NETWORK  [HostnameCanonicalizationWorker] Failed to obtain address information for hostname iZuf61zao4uxbprumx45dlZ: System error
2019-08-01T23:59:19.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:19.304+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:20.304+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:20.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:21.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:21.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:22.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:22.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:23.305+0800 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2019-08-01T23:59:23.305+0800 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2019-08-01T23:59:23.631+0800 E STORAGE  [thread2] WiredTiger (24) [1564675163:631372][9783:0x7f4e30730700], file:WiredTiger.wt, WT_SESSION.checkpoint: /var/lib/mongodb/WiredTiger.turtle: handle-open: open: Too many open files
2019-08-01T23:59:23.632+0800 E STORAGE  [thread2] WiredTiger (24) [1564675163:632761][9783:0x7f4e30730700], checkpoint-server: checkpoint server error: Too many open files
2019-08-01T23:59:23.632+0800 E STORAGE  [thread2] WiredTiger (-31804) [1564675163:632802][9783:0x7f4e30730700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-08-01T23:59:23.632+0800 I -        [thread2] Fatal Assertion 28558
2019-08-01T23:59:23.632+0800 I -        [thread2] 

***aborting after fassert() failure

2019-08-01T23:59:23.638+0800 F -        [thread2] Got signal: 6 (Aborted).
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31862
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31862
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

设置了 sysctl.conf fs.file-max = 2097152

每天都会崩溃 实在不清楚问题所在根源

15035 次点击
所在节点   MongoDB  MongoDB
22 条回复
2019-08-02 11:13:25 +08:00
2019-08-02 11:16:37 +08:00
建议在 /proc/PID/limits 文件里看进程到底能打开多少 FD
2019-08-02 11:35:34 +08:00

Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000 processes
Max open files 64000 64000 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31862 31862 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

2019-08-02 11:54:04 +08:00
@comwrg 检查下 TCP 连接的数量,可以使用 ss 或者 netstat,然后看看 mongodb 进程相关的连接数量是否过多。如果过多,要根据 TCP 所处的状态来进一步推断问题在哪里,到底是什么原因把文件描述符资源占用完了。比如说被拒绝服务攻击,大量空的 TCP 连接。

一个网络连接占用一个文件描述符( fd ),打开文件读写也占用一个。从错误日志来看,最先出现的错误是文件描述符用完,导致新的网络连接拿不到 fd,accept (接受新网络连接的系统调用)失败。这种情况还好。但是对数据库而言,文件写不进磁盘,数据无法落地,主动崩溃是好的做法。

针对楼主的问题,我觉得很可能是频繁调用的地方,文件使用完没有关闭,导致 fd 一直无法释放,最终达到上限。现在楼主应该从网络(第一段所说)与 /proc/PID/fd/目录下来排查故障原因。
2019-08-02 11:55:10 +08:00
inode 用完了。
2019-08-02 11:58:10 +08:00
2019-08-02 11:59:24 +08:00
我看里面的 version 是 2.6.7 与我的对不上呀 这个 BUG 也有点老老
2019-08-02 12:00:13 +08:00
这种一般都是磁盘没空间了,要不就是 i 节点用完了。
2019-08-02 12:03:53 +08:00
用 ulimit 或者 /etc/securiyt/limits.conf 去查看和修改是一种很经典的错误

后台服务的 rlimit 要在其启动的地方设置
2019-08-02 12:49:23 +08:00
ulimit 改大一点
2019-08-02 13:03:15 +08:00
记得 close
2019-08-02 13:59:03 +08:00
@est @aaa5838769 都没用哈
2019-08-02 13:59:27 +08:00
@est @aaa5838769 都没有哈
2019-08-02 14:24:09 +08:00
@auser 非常感谢🙏,已经按照您说的去排查了

排查到 mongodb 占用了很多 fd ( 24135/38839 )占用超过了一半往上


难道真的时候项目中没有关闭连接吗 不过这个项目已经运行了好几个月了 只是最近几天 mongo 开始频繁的因为 fd 用完而崩溃
2019-08-02 14:41:29 +08:00
2019-08-02 14:49:08 +08:00

2019-08-02 14:56:50 +08:00
@auser 好的,非常感谢您提供的建议。我自己再去慢慢排查:)
2019-08-02 17:48:07 +08:00
将 ulimit 设置成 64000,官网文档里讲了的
2019-08-02 23:23:13 +08:00

如果系统负载跟磁盘 io 不高
2019-08-03 13:09:09 +08:00
@auser 恩,已经设置到 200000 了

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX