pdf 裁剪后识别 content 问题，出现裁剪前内容

• 请不要在回答技术问题时复制粘贴 AI 生成的内容

This topic created in 1248 days ago, the information mentioned may be changed or developed.

使用 python pypdf2 库裁剪 PDF 页面，裁剪识别页面内容。

通过 tika 或者 pdfminer 都能识别出裁剪前的文本内容，导致识别结果有问题。

有没有大佬遇到过这个情况，怎么解决的。

1 replies • 2023-02-06 16:57:25 +08:00

AnroZ

Feb 6, 2023

问题：a.pdf (pypdf2 ）-> b.pdf (tika|pdfminer ）-> b.txt 等效于 a.pdf (tika|pdfminer ）-> a.txt ？？
会不会是另存为没覆盖？要不贴源码分析