2023/02/17

raffaele

@https://digipres.club/@raffaele

♻️ https://digipres.club/users/bitsgalore/statuses/109875752745620458
raffaele

@https://digipres.club/@raffaele

@bitsgalore May I suggest to add ocrmypdf https://ocrmypdf.readthedocs.io ? It's a wrapper of other python libraries and besides the text extraction (tesseract) is extremely good for pdf optimization https://ocrmypdf.readthedocs.io/en/latest/optimizer.html (and conversion to pdf/a). Also jbig encoding https://ocrmypdf.readthedocs.io/en/latest/jbig2.html is quite optimal for images of scanned text

2023-02-17T11:03:23Z
raffaele

@https://digipres.club/@raffaele

♻️ https://post.lurk.org/users/grafton9/statuses/109880288570105527
➡️