smartctl 输出结果求解

2023-06-14 17:12:17 +08:00
 xiaoyuesanshui
最近 transmission 总是报 local data corrupted #1777,pls verify local data

虽然 NAS 都有体面的关机,但是第一块硬盘也是 2019 年开始用的,我不得不担心硬盘的健康情况

smartctl -H 全部 pass
但是 smartctl -t short 有两个盘输出的结果不太对

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Interrupted (host reset) 90% 31594 -
# 2 Short captive Interrupted (host reset) 90% 31594 -
# 3 Short captive Interrupted (host reset) 90% 31594 -
# 4 Short captive Interrupted (host reset) 90% 31594 -
# 5 Short captive Interrupted (host reset) 90% 31594 -
# 6 Short captive Interrupted (host reset) 90% 31594 -
# 7 Short captive Interrupted (host reset) 90% 31594 -
# 8 Short captive Interrupted (host reset) 90% 31594 -
# 9 Short captive Interrupted (host reset) 90% 31594 -
#10 Short captive Interrupted (host reset) 90% 31594 -
#11 Short captive Interrupted (host reset) 90% 31590 -
#12 Short captive Interrupted (host reset) 90% 31590 -
#13 Short captive Interrupted (host reset) 90% 31590 -
#14 Short captive Interrupted (host reset) 90% 31590 -
#15 Short captive Interrupted (host reset) 90% 31590 -
#16 Short captive Interrupted (host reset) 90% 31590 -
#17 Short captive Interrupted (host reset) 90% 31590 -
#18 Short captive Interrupted (host reset) 90% 31590 -
#19 Short captive Interrupted (host reset) 90% 31590 -
#20 Short captive Interrupted (host reset) 90% 31590 -
#21 Short captive Interrupted (host reset) 90% 31590 -


查了一些资料,也没有太多帮助

后来我把硬盘上上的分区全部 umount ,再进行 smartctl -t short,结果也是这样

smartctl -a /dev/sdc 的结果如下

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-19-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68N32N0
Serial Number: WD-WCC7K7RJ5Z78
LU WWN Device Id: 5 0014ee 2bbbd7c5e
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Jun 14 17:11:26 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x04) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 41) The self-test routine was interrupted
by the host with a hard or soft reset.
Total time to complete Offline
data collection: (43560) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 463) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 194 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 162 159 021 Pre-fail Always - 6875
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 244
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 057 057 000 Old_age Always - 31595
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 220
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 21
193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 14413
194 Temperature_Celsius 0x0022 106 104 000 Old_age Always - 44
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 5
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Interrupted (host reset) 90% 31594 -
# 2 Short captive Interrupted (host reset) 90% 31594 -
# 3 Short captive Interrupted (host reset) 90% 31594 -
# 4 Short captive Interrupted (host reset) 90% 31594 -
# 5 Short captive Interrupted (host reset) 90% 31594 -
# 6 Short captive Interrupted (host reset) 90% 31594 -
# 7 Short captive Interrupted (host reset) 90% 31594 -
# 8 Short captive Interrupted (host reset) 90% 31594 -
# 9 Short captive Interrupted (host reset) 90% 31594 -
#10 Short captive Interrupted (host reset) 90% 31594 -
#11 Short captive Interrupted (host reset) 90% 31590 -
#12 Short captive Interrupted (host reset) 90% 31590 -
#13 Short captive Interrupted (host reset) 90% 31590 -
#14 Short captive Interrupted (host reset) 90% 31590 -
#15 Short captive Interrupted (host reset) 90% 31590 -
#16 Short captive Interrupted (host reset) 90% 31590 -
#17 Short captive Interrupted (host reset) 90% 31590 -
#18 Short captive Interrupted (host reset) 90% 31590 -
#19 Short captive Interrupted (host reset) 90% 31590 -
#20 Short captive Interrupted (host reset) 90% 31590 -
#21 Short captive Interrupted (host reset) 90% 31590 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


请问我这块盘有没有问题?


另外,reboot 服务器之后,transmission 中报错的任务 正常了
586 次点击
所在节点    问与答
10 条回复
paranoiagu
2023-06-14 18:37:28 +08:00
可能是 smart 坏了,我有一块西数红盘好像也是这样。
xiaoyuesanshui
2023-06-15 09:03:27 +08:00
请问 smart 坏了是什么意思?
按照我的理解,smart 应该是个程序吧
我一共有 sda/b/c/d/e
b/d 正常检测
c/e 同样的错误
julyclyde
2023-06-15 13:30:18 +08:00
每次测试都是 31594 出错
但是没产生 log
不过既然故障稳定重现,建议还是换盘

如果不甘心你就 long 测试
xiaoyuesanshui
2023-06-15 13:54:45 +08:00
31594 是通电时间
已经跑着 long 了
julyclyde
2023-06-16 13:55:22 +08:00
@xiaoyuesanshui 哦我看错字段了。我还以为 31594 是 LBA_of_first_error ;原来居然是 LifeTime(hours)吗??

short 确实是容易被打断的。用 long 吧
xiaoyuesanshui
2023-06-16 13:57:07 +08:00
@julyclyde 今天 long 出来了
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
1 Extended offline Completed: read failure 90% 31615 9047736

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 10

感觉要完蛋了 ,我已经在做迁移,准备换盘了
julyclyde
2023-06-16 13:59:44 +08:00
@xiaoyuesanshui 不幸中的万幸,至少是个确定的结果,可以坚定信心
不像我以前遇到的一些破烂,都 tmd 已经无法通信了还报 health OK 呢
xiaoyuesanshui
2023-06-16 14:09:55 +08:00
@julyclyde 但是我还有一个一万多小时的 18T 盘,也没用多少。short 不出。long 也是 interrupted 。搞得也很烦

不过那个盘还在保,还没有坏道出现,数据我先往里面迁。
julyclyde
2023-06-16 17:00:29 +08:00
@xiaoyuesanshui 不读写的时候做测试试试?
单用户模式,不 mount
xiaoyuesanshui
2023-06-16 17:04:30 +08:00
@julyclyde 我是在非读写期间做的 long 测试,但是还被打断了

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/948736

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX