如何批量检测 epub 损坏?通过写程序来实现,求思路

2020-11-01 00:40:11 +08:00
 pkwenda
mac 倒是可以预览封面图,但是不够牛逼不自动化,想写个程序。

不知道 epubcheck 是不是干这个事情的,感觉不像,介绍说检测是否符合规范,我仅仅知道是否损坏能否打开就行了,规范之类的不重要
1258 次点击
所在节点    问与答
8 条回复
lxilu
2020-11-01 00:47:16 +08:00
哪种损坏?压缩包?元文件?网页?
lxilu
2020-11-01 00:51:03 +08:00
能不能打开那解压就行
pkwenda
2020-11-01 00:54:30 +08:00
@lxilu #1 可能有好几种错误:

1 、下载发生错误,只下载了一个 4kb 的文件对齐的垃圾文件
2 、下载一半信息不全

我刚用了 epubcheck 测了一下


坏文件:

```
Messages: 1 fatal / 0 errors / 0 warnings / 0 infos
```
好文件:
```
...此处省略 500 行
ERROR(RSC-005): 507778787564457984.epub/text/part0272.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0273.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0273.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0274.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0274.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0275.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0275.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0276.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0276.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0277.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0277.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0278.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0278.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0279.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0279.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0280.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0280.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(9,70): Error while parsing file: Duplicate "8BVE20-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,70): Error while parsing file: Duplicate "8BVE20-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_005.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(9,70): Error while parsing file: Duplicate "8CTUK0-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,70): Error while parsing file: Duplicate "8CTUK0-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_005.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0283.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0283.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(9,70): Error while parsing file: Duplicate "8EQVO0-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,70): Error while parsing file: Duplicate "8EQVO0-a62d2f2e31ed4ca88c23f95b0c6356e7"
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0285.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
ERROR(RSC-005): 507778787564457984.epub/text/part0285.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons

Check finished with errors
Messages: 0 fatals / 545 errors / 0 warnings / 0 infos
```

貌似问题解决了,好像是只要看是否发生致命错误就可以了



epubcheck: https://github.com/w3c/epubcheck
gainsurier
2020-11-01 00:55:31 +08:00
检测是否符合 epub 格式的约束就可以
pkwenda
2020-11-01 00:55:48 +08:00
看来 epubcheck 在检测是否规范的同时,也会检测是否损坏的~
pkwenda
2020-11-01 00:58:57 +08:00
lxilu
2020-11-01 01:01:00 +08:00
解压更轻
pkwenda
2020-11-01 01:10:18 +08:00
@lxilu #7 谢谢老哥,我现在知道了,epub 就是 zip~ 汗

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/720596

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX