V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
pkwenda
V2EX  ›  问与答

如何批量检测 epub 损坏?通过写程序来实现,求思路

  •  
  •   pkwenda · 2020-11-01 00:40:11 +08:00 · 1180 次点击
    这是一个创建于 1279 天前的主题,其中的信息可能已经有所发展或是发生改变。
    mac 倒是可以预览封面图,但是不够牛逼不自动化,想写个程序。

    不知道 epubcheck 是不是干这个事情的,感觉不像,介绍说检测是否符合规范,我仅仅知道是否损坏能否打开就行了,规范之类的不重要
    8 条回复    2020-11-01 01:10:18 +08:00
    lxilu
        1
    lxilu  
       2020-11-01 00:47:16 +08:00 via iPhone
    哪种损坏?压缩包?元文件?网页?
    lxilu
        2
    lxilu  
       2020-11-01 00:51:03 +08:00 via iPhone
    能不能打开那解压就行
    pkwenda
        3
    pkwenda  
    OP
       2020-11-01 00:54:30 +08:00
    @lxilu #1 可能有好几种错误:

    1 、下载发生错误,只下载了一个 4kb 的文件对齐的垃圾文件
    2 、下载一半信息不全

    我刚用了 epubcheck 测了一下


    坏文件:

    ```
    Messages: 1 fatal / 0 errors / 0 warnings / 0 infos
    ```
    好文件:
    ```
    ...此处省略 500 行
    ERROR(RSC-005): 507778787564457984.epub/text/part0272.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0273.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0273.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0274.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0274.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0275.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0275.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0276.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0276.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0277.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0277.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0278.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0278.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0279.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0279.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0280.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0280.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(9,70): Error while parsing file: Duplicate "8BVE20-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_000.html(10,70): Error while parsing file: Duplicate "8BVE20-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0281_split_005.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(9,70): Error while parsing file: Duplicate "8CTUK0-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_000.html(10,70): Error while parsing file: Duplicate "8CTUK0-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0282_split_005.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0283.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0283.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,136): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(9,70): Error while parsing file: Duplicate "8EQVO0-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_000.html(10,70): Error while parsing file: Duplicate "8EQVO0-a62d2f2e31ed4ca88c23f95b0c6356e7"
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_001.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_002.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_003.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0284_split_004.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0285.html(9,70): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons
    ERROR(RSC-005): 507778787564457984.epub/text/part0285.html(10,63): Error while parsing file: value of attribute "id" is invalid; must be an XML name without colons

    Check finished with errors
    Messages: 0 fatals / 545 errors / 0 warnings / 0 infos
    ```

    貌似问题解决了,好像是只要看是否发生致命错误就可以了



    epubcheck: https://github.com/w3c/epubcheck
    gainsurier
        4
    gainsurier  
       2020-11-01 00:55:31 +08:00 via iPhone   ❤️ 1
    检测是否符合 epub 格式的约束就可以
    pkwenda
        5
    pkwenda  
    OP
       2020-11-01 00:55:48 +08:00
    看来 epubcheck 在检测是否规范的同时,也会检测是否损坏的~
    pkwenda
        6
    pkwenda  
    OP
       2020-11-01 00:58:57 +08:00
    lxilu
        7
    lxilu  
       2020-11-01 01:01:00 +08:00 via iPhone
    解压更轻
    pkwenda
        8
    pkwenda  
    OP
       2020-11-01 01:10:18 +08:00
    @lxilu #7 谢谢老哥,我现在知道了,epub 就是 zip~ 汗
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   2241 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 36ms · UTC 10:16 · PVG 18:16 · LAX 03:16 · JFK 06:16
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.