What's a good Linux OCR application that supports PDFs?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    unperson
    Now 100%

    There are options for several cases:

      --remove-vectors      EXPERIMENTAL. Mask out any vector objects in the PDF
                            so that they will not be included in OCR. This can
                            eliminate false characters.
      -f, --force-ocr       Rasterize any text or vector objects on each page,
                            apply OCR, and save the rastered output (this rewrites
                            the PDF)
      -s, --skip-text       Skip OCR on any pages that already contain text, but
                            include the page in final output; useful for PDFs that
                            contain a mix of images, text pages, and/or previously
                            OCRed pages
      --redo-ocr            Attempt to detect and remove the hidden OCR layer from
                            files that were previously OCRed with OCRmyPDF or
                            another program. Apply OCR to text found in raster
                            images. Existing visible text objects will not be
                            changed. If there is no existing OCR, OCR will be
                            added.
    
    3
  • What's a good Linux OCR application that supports PDFs?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    unperson
    Now 100%

    ocrmypdf does everything, it even compresses your pdf for uploading if you add the -O2 or -O3 flag.

    If it's a raw scan, clean it up with scantailor first. Also experiment with various --tesseract-pagesegmodesettings for best results. I use 6 for most books, 3 is there are multiple columns, 1 if there are multiple languages.

    6
  • What are your thoughts on nuclear energy?
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    unperson
    Now 100%

    The logic of the article doesn't really follow? It implies there's no uranium just because there's no profitable way to extract it in France and the US at the current dirt cheap prices. It even acknowledges there's plenty of Uranium ore in Kazakhstan, but provides no data about it.

    You could multiply the price of uranium in the graphs by 100 and it would barely impact the cost of electricity: nuclear power is extremely fuel efficient, even a price of 10000 $/kg of natural uranium implies less than 0.1 $/kWh of electricity in fuel costs.

    Capitalist mining companies will never survey more uranium mines when uranium demand has been dwindling for the last 2 decades due to the anti-nuclear media campaign and the inherent low rate of profit of such a capital-intensive industry.

    1
  • US Travel Ban on China's Ruling Communist Party Members and Their Families
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    unperson
    Now 100%

    Well they could search the phones / computers of every Chinese visitor, you can actually be denied entry to the US for refusing to hand over your passwords.

    4
  • Feedback friday - 2020-07-17
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    unperson
    Now 100%

    Right now lemmy.ml doesn't have an AAAA record, if the server has a public IPv6 perhaps you just need to add it. Make sure you're binding to :: and not to 0.0.0.0.

    You could test by disconnecting your computer (not the server!) from the v4 Internet with ip route del default, assuming you have IPv6 connectivity. Do ip route show default first so that you can ip route add default via … later to restore your v4 access. If your ISP doesn't offer v6 you'd have to fiddle with Teredo, but it's worth it because then you have end to end connectivity with everyone else on the v6 Internet with no NAT.

    3
  • "Initials" by "Florian Körner", licensed under "CC0 1.0". / Remix of the original. - Created with dicebear.comInitialsFlorian Körnerhttps://github.com/dicebear/dicebearUN
    Now
    0 11

    unperson

    lemmy.ml