CentOS 下 Python web server 出现 too many open files 异常如何排查问题?

2016-07-07 16:44:29 +08:00
 zeroday
在 linux 下用 python+ansible+django 写的网站,每隔一段时间就会出现 too many open files 的异常。

我该如何排查这个问题呢?
1616 次点击
所在节点    问与答
12 条回复
shyling
2016-07-07 16:59:13 +08:00
limits 设高点
wzxjohn
2016-07-07 16:59:24 +08:00
ulimit -a
zeroday
2016-07-07 17:26:44 +08:00
@wzxjohn

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 126875
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 200000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 126875
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
zeroday
2016-07-07 17:26:59 +08:00
@shyling 已经设置成 200000
zeroday
2016-07-07 17:27:50 +08:00
@shyling 已经设置为 200000 了,还是会满。
BOYPT
2016-07-07 17:29:50 +08:00
得从你的程序分析,是什么打开了描述字,什么导致没有回收。
ryd994
2016-07-07 18:10:52 +08:00
用 lsof -p python 的 pid 看看都打开了什么文件
zeroday
2016-08-14 14:45:25 +08:00
@BOYPT @ryd994

查了一下文件已经删除,但是没有释放的文件描述符。

[root@85-13-112]# lsof | grep deleted
python 19327 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19332 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19332 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19334 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19334 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19335 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19335 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19336 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19336 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19337 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19337 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19338 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19338 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19339 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19339 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19340 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19340 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19341 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19341 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19342 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19342 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19343 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19343 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19344 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19344 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19345 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19345 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19346 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19346 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19347 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19347 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19348 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19348 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19349 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19349 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19350 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19350 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19351 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19351 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19352 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19352 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19353 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19353 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19354 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19354 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19355 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19355 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19356 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19356 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19357 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19357 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19358 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19358 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19359 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19359 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19360 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19360 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19361 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19361 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19362 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19362 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)
python 19327 19363 root 8u REG 252,2 0 534767 /tmp/tmp4Hlr9O (deleted)
python 19327 19363 root 12u REG 252,2 0 534768 /tmp/tmp2TSn03 (deleted)

[root@85-13-112]# lsof | grep deleted | wc -l
65
ryd994
2016-08-14 14:59:57 +08:00
@zeroday 所以用完文件随手关一下啊
BOYPT
2016-08-14 22:18:12 +08:00
@zeroday 比如说使用了 tempfile.TemporaryFile 返回的描述字对象一直没有 close ,就这样了啊
zeroday
2016-08-14 22:32:30 +08:00
@BOYPT 谢谢,后来我也发现了 ansible 里这段代码。

```
def write_file(module, url, dest, content):
# create a tempfile with some test content
fd, tmpsrc = tempfile.mkstemp()
f = open(tmpsrc, 'wb')
try:
f.write(content)
except Exception, err:
os.remove(tmpsrc)
module.fail_json(msg="failed to create temporary content file: %s" % str(err))
f.close()
```

fd, tmpsrc = tempfile.mkstemp() 生成的 fd 没有 close 。

还有一个问题,就是发现线上服务器上 /dev/null 这个描述符。这个可能是什么没有关呢?
BOYPT
2016-08-18 14:00:11 +08:00
@zeroday 一般服务进程的的 stdin/stdout/stderr 三个特殊描述字,可能会指向 /dev/null ,不需要关闭。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/290916

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX