• raffaele
    @https://digipres.club/@raffaele
    ♻️ https://digipres.club/users/bitsgalore/statuses/109875752745620458
  • raffaele
    @https://digipres.club/@raffaele

    @bitsgalore May I suggest to add ocrmypdf https://ocrmypdf.readthedocs.io ? It's a wrapper of other python libraries and besides the text extraction (tesseract) is extremely good for pdf optimization https://ocrmypdf.readthedocs.io/en/latest/optimizer.html (and conversion to pdf/a). Also jbig encoding https://ocrmypdf.readthedocs.io/en/latest/jbig2.html is quite optimal for images of scanned text

    2023-02-17T11:03:23Z
  • raffaele
    @https://digipres.club/@raffaele
    ♻️ https://post.lurk.org/users/grafton9/statuses/109880288570105527
  • ➡️

...