SpringBoot 程序运行中突然中止， JVM 退出

134 天前

Geekerstar

程序退出后，生成了一个 hs_err_pid1301132.log 文件，前面的内容如下，可以根据这个判断出是什么原因导致的吗？

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x000072ca636529ce, pid=1301132, tid=0x000072c7af5ff640

JRE version: Java(TM) SE Runtime Environment (8.0_371) (build 1.8.0_371-b11)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.371-b11 mixed mode linux-amd64 compressed oops)

Problematic frame:

J 39882 C2 sun.nio.ch.IOUtil.write(Ljava/io/FileDescriptor;[Ljava/nio/ByteBuffer;IILsun/nio/ch/NativeDispatcher;)J (509 bytes) @ 0x000072ca636529ce [0x000072ca636528a0+0x12e]

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

3077 次点击

所在节点

Java

48 条回复

jiom

134 天前

ulimit -a 看看系统有没有做参数优化

dx123

134 天前

看看你的 JVM 参数和配置呢

iyiluo

134 天前

jvm 没配置 core dump ，这点日志看不出什么。看谷歌结果貌似是 netty 报的错，检查一下代码或者加 core dump 参数后等复现吧

Geekerstar

134 天前

@jiom 没有优化过：
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 513419
max locked memory (kbytes, -l) 16438992
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 513419
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Geekerstar

134 天前

@dx123 -Xmx2048m -Xms2048m -XX:+UseG1GC 没有做其他的优化

Geekerstar

134 天前

@iyiluo 好的，感谢

serverKnignt

134 天前

是不是被 Linux kill 了？

serverKnignt

134 天前

@serverKnignt 可以看下日志，cat /var/log/messages | egrep -T -C10 Kill

ZZ74

134 天前

jvm 退出最常见的就是内存不够了，要么是堆内存，要么是堆外内存。结合 ByteBuffer 和 SIGSEGV ，堆外可能性最大。具体的得描述下你的程序是干嘛的，以及更多 log

Geekerstar

134 天前

@iyiluo 我的错误和这个很像，难道真的是 Netty 的问题？ https://github.com/netty/netty/issues/4206

Geekerstar

134 天前

@serverKnignt #7 应该不是，查了系统日志没发现啥问题

Geekerstar

134 天前

@ZZ74 我从监控平台看了下内存那些都很正常的呢？ https://imgse.com/i/pAGLDy9
https://imgse.com/i/pAGLBQJ 程序有大量的 IO 和 CPU 操作，用的 undertow 框架。

willbetter

134 天前

可能是磁盘空间不足或者是打开的文件数据过多了

ZZ74

134 天前

@Geekerstar 你贴的 gihub 的例子就是 memory 问题啊，而且人家也给出了 netty 配置。
hs_err_pid1301132.log 文件内容远不止于此，github 上面就详细多了，能看到 oom 报错还有
Event: 984.476 GC heap before
ParOldGen total 2796544K, used 2796216K
object space 2796544K, 99% used

你那个监控中的 45 到 9 点那一下 memory 的异常大量分配很值得怀疑，之后没数据了，持续 full GC 崩溃了导致没数据了吧

Geekerstar

134 天前

@ZZ74 #14 大佬能否帮忙分析一下呢？实在找不到问题了，这是几张完整的监控截图。大概八点四五十的时候用户开始使用的，之后没数据是因为 JVM 退出了。
https://imgse.com/i/pAJpwrD
https://imgse.com/i/pAJp0qe
https://imgse.com/i/pAJpdKO

这是 hs_err_pid1301132.log 文件
pan 点 baidu 点 com/s/1hZ5Fb8Nir458vSiS14XY-Q 提取码: cgpp

D3EP

134 天前

程序有手动控制 Netty ByteBuf 的逻辑吗？之前接手过一个 RPC 框架，在一些业务场景下出现类似的 Segment Fault ，发现是在某个地方释放了 ByteBuf ，但后面又往里写数据最终 core 掉了。

Geekerstar

134 天前

@D3EP 没有哦，我们程序有 websocket 推送，而且 web 框架是用的 undertow ，好像底层有 netty ，但是没有直接在业务代码里操作 netty

Geekerstar

134 天前

@willbetter 磁盘还有几个 T ，文件句柄看日志也正常范围

julyclyde

134 天前

@jiom ulimit 只对“当前 shell”有效。查看别的进程的 limit 需要看/proc

ZZ74

134 天前

SEGV_MAPERR 错误，崩溃前平凡 FGC ，但是内存能下来，看起来有临时大对象。AbstractFramedChannel.flushSenders()的时候崩了。
我不知道 undertow 用的堆内还是堆外。有可能是因为 ByteBuf 大小有限把 java 大对象写入 ByteBuf 时导致的。也不排除 16 楼说的。
你可以结合你的系统特点看看。看起来是个物联网数据采集系统不知道 flushSenders 是在给设备发数据还是提供查询。

第 1 页／共 3 页

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/1078482

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.