• OCR on Windows

    From Bill Powell@21:1/5 to All on Sun Jul 14 02:46:04 2024
    XPost: comp.text.pdf

    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From micky@21:1/5 to Powell on Sat Jul 13 21:57:19 2024
    XPost: comp.text.pdf

    In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
    Powell <bill@anarchists.org> wrote:

    I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Aren't there lots of websites that do this, but you have to upload the
    file. I've resisted that but would be really happpy if I could do it
    inside my computer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Newyana2@21:1/5 to Bill Powell on Sat Jul 13 22:22:11 2024
    XPost: comp.text.pdf

    On 7/13/2024 8:46 PM, Bill Powell wrote:
    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From micky@21:1/5 to newyana@invalid.nospam on Sat Jul 13 22:52:52 2024
    XPost: comp.text.pdf

    In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 <newyana@invalid.nospam> wrote:

    On 7/13/2024 8:46 PM, Bill Powell wrote:
    I have a series of one-page PDFs that are really images and not text even
    though they look like they're just a page of simple text in the same font. >>
    Is there a way to easily OCR a PDF to actual text on Windows for free?

    I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    http://www.freeocr.net/
    http://www.paperfile.net/ https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charlie@21:1/5 to All on Sat Jul 13 21:08:37 2024
    XPost: comp.text.pdf

    On this Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 wrote:

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    I too use FreeOCR, which I find is more accurate than others I've tested.
    Mine is Free OCR version 5.41 from long ago, September 2015.
    There may be a new version, but here is my log file from those days.

    FreeOCR http://www.paperfile.net/ (note it's not a secure web site) http://www.paperfile.net/download.html
    http://www.paperfile.net/freeocr541.exe
    Name: freeocr541.exe
    Size: 11316239 bytes (10 MiB)
    SHA256: 0BF9D979C7BC3774FC6AE39DF31AFC89BFD9AF60120FC2D1BE50B1B35E850D64

    The stone-age installer doesn't even ask where to go on your filesystem.
    Worse, it doesn't even go into Program Files but on the C: top level. C:\FreeOCR\FreeOCR.exe
    But you can move it to wherever you put your programs on your file system.
    It even works in the D drive (but you can't pin a shortcut to the taskbar).

    It's pretty easy to use.
    Once FreeOCR opens up, press the "Open PDF" icon.
    Then press the "OCR" icon.

    Then in the right window will be the OCR results, which are accurate.
    Then you copy those OCR text results into your Windows clipboard.

    From there you paste into your editor of choice.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Powell@21:1/5 to micky on Sun Jul 14 05:02:26 2024
    XPost: comp.text.pdf

    On Sat, 13 Jul 2024 21:57:19 -0400, micky wrote:

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Aren't there lots of websites that do this, but you have to upload the
    file. I've resisted that but would be really happpy if I could do it
    inside my computer.

    These are scanned medical records.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From cable_shill@comcast.net@21:1/5 to All on Sat Jul 13 21:06:54 2024
    XPost: comp.text.pdf

    Windows Power Toys - Text extractor.


    On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
    wrote:

    I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul in Houston TX@21:1/5 to All on Sat Jul 13 23:23:55 2024
    XPost: comp.text.pdf

    Newyana2 wrote:
    On 7/13/2024 8:46 PM, Bill Powell wrote:
    I have a series of one-page PDFs that are really images and not text even
    though they look like they're just a page of simple text in the same
    font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

      I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    +1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Bill Powell on Sat Jul 13 22:45:38 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?


    OPTION A (if you have OneNote, which is part of MS Office):

    1. Paste the image into OneNote.
    2. Right-click into the pasted image and select "Copy text from
    picture".
    3. In your favorite text editor, press Ctrl+V to paste the text.
    4. Proofread and make any needed corrections.

    I have Office 2010, not Office 365, but I believe OneNote is included
    in Office 365.


    OPTION B:

    Which PDF reader are you using? PDF-Xchange (free) has a menu
    selection to perform OCR, putting the text as an extra layer in the
    PDF. You can then copy the text from the PDF and paste it into your
    editor.

    And I'm sure there are other free PDF viewers that have OCR
    capability, though PDF-Xchange is the only one I use.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to cable_shill@comcast.net on Sat Jul 13 22:58:17 2024
    XPost: comp.text.pdf

    On Sat, 13 Jul 2024 21:06:54 -0700, cable_shill@comcast.net wrote:

    On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
    wrote:

    I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Windows Power Toys - Text extractor.

    You forgot to give the URL: https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

    That one says it's "based on Joe Finney's TextGrab", and links to https://github.com/TheJoeFin/Text-Grab

    Has anyone tried both, and can speak to whether one does a better job
    of text extraction than the other?

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeff Barnett@21:1/5 to micky on Sun Jul 14 00:35:44 2024
    XPost: comp.text.pdf

    On 7/13/2024 8:52 PM, micky wrote:
    In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 <newyana@invalid.nospam> wrote:

    On 7/13/2024 8:46 PM, Bill Powell wrote:
    I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
    Is there a way to easily OCR a PDF to actual text on Windows for free?

    I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    http://www.freeocr.net/

    Several pointers embedded at the URL above elicit "blacklisted site"
    messages from AVG.

    http://www.paperfile.net/ https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR
    --
    Jeff Barnett

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Bill Powell on Sun Jul 14 09:25:09 2024
    XPost: comp.text.pdf

    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.

    Or you can use Firefox to display the pdf and and use an OCR
    plug-in.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From knuttle@21:1/5 to Herbert Kleebauer on Sun Jul 14 06:54:16 2024
    XPost: comp.text.pdf

    On 07/14/2024 3:25 AM, Herbert Kleebauer wrote:
    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even
    though they look like they're just a page of simple text in the same
    font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.

    Or you can use Firefox to display the pdf and and use an OCR
    plug-in.

    I use Irfanveiw for all my image and OCR projects.

    You need Irfanview and the OCR plugin.

    Open the PDF file in Irfanvieiw, high lite the text and activate the
    OCR function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Newyana2@21:1/5 to Jeff Barnett on Sun Jul 14 08:45:02 2024
    XPost: comp.text.pdf

    On 7/14/2024 2:35 AM, Jeff Barnett wrote:

    Several pointers embedded at the URL above elicit "blacklisted site"
    messages from AVG.

    I should have posted the URL. freeocr.net is just a listing site. paperfile.net is the host of FreeOCR.

    I researched this awhile back. I'd been using something that I'd got
    from a magazine CD in the late 90s and it actually worked pretty well. Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
    served me well.)

    But I decided to look around for something more up-to-date because
    I sometimes want to convert things like photo-PDFs to plain text.

    FreeOCR seems to be simple, quick and no-nonsense. It saves the step
    of having to extract images from PDFs. The only down
    side is that it came out in early Win10 days and it has a kiddie interface
    with a silly fading window at close, with no option to change that.
    However... it might be Fischer-Price, but it works. :)

    There's an explanation at the site. If I remember correctly, the system
    it uses is OSS and while there are newer versions, I didn't find anything
    else that was all put together. What I mean is that you can find more recent updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
    but it's OSS that's hard to find as finished software.

    The program seems to be a fairly simple .Net wrapper around a compiled
    EXE version of Tesseract, but it's well designed, making Tesseract usable
    and convenient.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Newyana2@21:1/5 to Stan Brown on Sun Jul 14 09:04:33 2024
    XPost: comp.text.pdf

    On 7/14/2024 1:45 AM, Stan Brown wrote:

    And I'm sure there are other free PDF viewers that have OCR
    capability, though PDF-Xchange is the only one I use.


    I also use PDFXV free and love it. I had to get a new version
    for Win10. Build 322.10. Lucky it was stil available free. My older
    version on XP didn't work right on 10.

    PDFXV is quick, does search well, allows me to edit PDFs by
    extracting pages as images and pasting them in that way...
    I've done my taxes that way -- both fillable forms and non-fillable.
    And the whole thing is about 25 MB.

    I think Adobe's monstrosity
    Reader is something like 300+ MB these days. I went to take a
    look, but their version has become even more creepy than before.
    First, Adobe wouldn't load a webpage without script, which I didn't
    want to enable. Then I found through Major Geeks that the current
    version is ad-supported. So I'm guessing they want people to sign
    up so they can target the ads... Just when I thought Adobe couldn't
    get any more creepy.

    I'd never noticed the OCR function in PDFXV. It's not very intuitive,
    but it seems to work. I finally figured out that I needed to pick the
    selection tool, select all, then copy, to get the converted text.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From micky@21:1/5 to newyana@invalid.nospam on Sun Jul 14 10:09:26 2024
    XPost: comp.text.pdf

    In alt.comp.os.windows-10, on Sun, 14 Jul 2024 08:45:02 -0400, Newyana2 <newyana@invalid.nospam> wrote:

    On 7/14/2024 2:35 AM, Jeff Barnett wrote:

    Several pointers embedded at the URL above elicit "blacklisted site"
    messages from AVG.

    I should have posted the URL. freeocr.net is just a listing site.
    paperfile.net is the host of FreeOCR.

    And it doesn't mention win10 or 11. I can assume you've been using it
    with one of those two.

    I thought of just installing it to see if it works, but who knows, maybe installing old, no longer compaitble software could mess up my OS??

    I researched this awhile back. I'd been using something that I'd got
    from a magazine CD in the late 90s and it actually worked pretty well. >Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
    served me well.)

    But I decided to look around for something more up-to-date because
    I sometimes want to convert things like photo-PDFs to plain text.

    FreeOCR seems to be simple, quick and no-nonsense. It saves the step
    of having to extract images from PDFs. The only down
    side is that it came out in early Win10 days and it has a kiddie interface >with a silly fading window at close, with no option to change that. >However... it might be Fischer-Price, but it works. :)

    There's an explanation at the site. If I remember correctly, the system
    it uses is OSS and while there are newer versions, I didn't find anything >else that was all put together. What I mean is that you can find more recent >updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
    but it's OSS that's hard to find as finished software.

    The program seems to be a fairly simple .Net wrapper around a compiled
    EXE version of Tesseract, but it's well designed, making Tesseract usable
    and convenient.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Isaac Montara@21:1/5 to knuttle on Sun Jul 14 16:11:53 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

    I use Irfanveiw for all my image and OCR projects.

    You need Irfanview and the OCR plugin.

    Open the PDF file in Irfanvieiw, high lite the text and activate the
    OCR function.

    Nice! Once you figure it out, Irfanview with the plugin is great!

    I opened a scanned-page bitmap PDF image in Irfanview.
    Irfanview:File > Open > scan.jpg
    Irfanview:Options > Start OCR...(Plugin)
    This opened up the page of bitmap text in yellow highlight at the left.
    At the right of the full-size display was a bunch of buttons.
    None of them was a copy command.

    The plugin appears to be a KADMOS Recognition Engine, version 4.4y but all
    I want is a way to copy the highlighted text inside the bitmap image.

    The text is yellow. But you can't copy it to your clipboard. Or save it.

    It took a good couple of minutes of futzing around before I realized what
    you have to do is use your left mouse button as if you're going to crop something and choose a box from top left of the text to top right.

    The instant you "crop" out that text, you get a "KADMOS recognition
    results" window popping up, with the OCR results in now-selectable text.

    The results looked accurate in the one test I just gave it just now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Enrico Papaloma@21:1/5 to Stan Brown on Sun Jul 14 21:57:02 2024
    XPost: comp.text.pdf

    On 7/14/2024 7:45 AM, Stan Brown wrote:
    And I'm sure there are other free PDF viewers that have OCR
    capability, though PDF-Xchange is the only one I use.

    Which of these three files is the one with the OCR? https://pdf-xchange.eu/DL/pdf-xchange-editor.htm

    Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
    Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File)
    Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

    It says "ohne OCR". What does "ohne" mean anyway?
    Also, it says it puts a watermark in all files - does it do that for OCR?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From david@21:1/5 to All on Sun Jul 14 14:01:01 2024
    XPost: comp.text.pdf

    Using <news:v70icj$5c3b$1@dont-email.me>, Newyana2 wrote:

    I also use PDFXV free and love it. I had to get a new version
    for Win10. Build 322.10. Lucky it was stil available free. My older
    version on XP didn't work right on 10.

    I can't find any download for PDFXV. https://www.google.com/search?q=windows+%2Bpdfxv+download

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nick Cine@21:1/5 to Paul in Houston TX on Sun Jul 14 14:26:47 2024
    XPost: comp.text.pdf

    On Sat, 13 Jul 2024 23:23:55 -0500, Paul in Houston TX wrote:

      I have a program called FreeOCR that will do it without having to scan
    or extract the pages. Quality depends on fonts, words, etc, but general
    it comes out well.

    +1

    There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there. https://jocr.sourceforge.net/
    There's no mention it uses the modern Tesseract scan engine though.
    Which may be why it makes so many errors that it's not really useful.

    What you want is to invoke the Tessseract scan engine directly somehow.

    There is a way to invoke the Tesseract scan engine directly, but I don't
    know how to do it. Much like most of the youtube downloading GUIs run the yt-dlp command-line tool under the covers, most of the OCRs tools run the command line for Tesseract under the sheets.

    The question then would be how to run the Tesseract OCR engine directly?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bill Powell@21:1/5 to Nick Cine on Sun Jul 14 22:37:25 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 14:26:47 -0600, Nick Cine wrote:

    There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there. https://jocr.sourceforge.net/
    There's no mention it uses the modern Tesseract scan engine though.

    I had tried the GNU OCR command line before opening the thread.
    http://www-e.uni-magdeburg.de/jschulen/ocr/gocr049.exe
    Name: gocr049.exe
    Size: 153600 bytes (150 KiB)
    SHA256: 1FFC4CD29A5B275F40FBC5F6F9194ED72B8D2BCCBD46019F088C9E5DE2923F59

    It makes so many spelling errors that it would be easier to type the text
    out by hand - which is why I opened this thread to find an OCR that worked.

    Looking up the hints you gave me, I think there are many potential Linux,
    Mac, Windows, Android & iOS OCR scanning candidates in this github table.
    https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html

    What is a bit disconcertingly strange is that of all the tools mentioned so
    far in this thread, none of them show up in that table and yet that table
    has dozens of tools that do OCR so I'm not sure why none of the mentioned
    tools showed up.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wolf Greenblatt@21:1/5 to micky on Sun Jul 14 16:50:37 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 10:09:26 -0400, micky wrote:

    I should have posted the URL. freeocr.net is just a listing site. >>paperfile.net is the host of FreeOCR.

    And it doesn't mention win10 or 11. I can assume you've been using it
    with one of those two.

    I thought of just installing it to see if it works, but who knows, maybe installing old, no longer compaitble software could mess up my OS??

    There's something called Simple OCR https://www.simpleocr.com/download/
    which says it's free but I've never tried it so I can't vouch for it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan K.@21:1/5 to All on Sun Jul 14 22:44:51 2024
    XPost: comp.text.pdf

    W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:

    Windows Power Toys - Text extractor.

    You forgot to give the URL: https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

    That one says it's "based on Joe Finney's TextGrab", and links to https://github.com/TheJoeFin/Text-Grab

    Has anyone tried both, and can speak to whether one does a better job
    of text extraction than the other?

    I've tried something similar to Microsoft Office for OCR on Windows.
    What I tried was a MS Office clone called WPS Office, which I found here. https://www.wps.com/office/pdf/

    The company appears to be "Kingsoft" and their webstubb installer is here. https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe

    Name: wps_lid.lid-u8MZl7zT7a0C.exe
    Size: 5864848 bytes (5727 KiB)
    SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B

    Since that looks scary, I ran it by a virustotal which cleared it clean. https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b

    You have to be careful as it will change your PDF defaults.
    Select "Custom Settings" (not "Install Now").
    Change from:
    [x] Use WPS Office to open pdf files by default
    [x] Use WPS Office as the default program for documents
    [x] Use WPS Photos to open JPG, PNG, and other image formats by default

    Change to:
    [_] Use WPS Office to open pdf files by default
    [_] Use WPS Office as the default program for documents
    [_] Use WPS Photos to open JPG, PNG, and other image formats by default

    Then hit the big blue "Install Now" button.
    It will say "Downloading WPS Office" so you know it was just a stub.

    It will create a wps_download directory containing:
    Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe Size: 244193632 bytes (232 MiB)
    SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525

    After taking forever downloading & installing WPS Office,
    WPS Office will try to trick you into installing "360 Total Security".
    Do not select the box [_]Yes, I agree to install 360 Total Security...
    Click the big blue box "Get Started with WPS".

    Start WPS Office and click away the sell-up advertising.
    Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in

    You have to sign in to what in order to convert a PDF to OCR with WPS.
    I guess in the end it's maybe an online converter - but it's hard to tell.
    I didn't create an account so I never was able to find out how it works.

    All I know is it's a Microsoft Office clone that says it does OCR for free.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Big Al@21:1/5 to Jan K. on Sun Jul 14 16:54:22 2024
    XPost: comp.text.pdf

    On 7/14/24 04:44 PM, Jan K. wrote:
    W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:

    Windows Power Toys - Text extractor.

    You forgot to give the URL:
    https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

    That one says it's "based on Joe Finney's TextGrab", and links to
    https://github.com/TheJoeFin/Text-Grab

    Has anyone tried both, and can speak to whether one does a better job of text extraction than the
    other?

    I've tried something similar to Microsoft Office for OCR on Windows.
    What I tried was a MS Office clone called WPS Office, which I found here. https://www.wps.com/office/pdf/

    The company appears to be "Kingsoft" and their webstubb installer is here. https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe

    Name: wps_lid.lid-u8MZl7zT7a0C.exe
    Size: 5864848 bytes (5727 KiB)
    SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B

    Since that looks scary, I ran it by a virustotal which cleared it clean. https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b

    You have to be careful as it will change your PDF defaults.
    Select "Custom Settings" (not "Install Now").
    Change from:
    [x] Use WPS Office to open pdf files by default
    [x] Use WPS Office as the default program for documents
    [x] Use WPS Photos to open JPG, PNG, and other image formats by default

    Change to:
    [_] Use WPS Office to open pdf files by default
    [_] Use WPS Office as the default program for documents
    [_] Use WPS Photos to open JPG, PNG, and other image formats by default

    Then hit the big blue "Install Now" button.
    It will say "Downloading WPS Office" so you know it was just a stub.

    It will create a wps_download directory containing:
    Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe Size: 244193632 bytes (232 MiB)
    SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525

    After taking forever downloading & installing WPS Office,
    WPS Office will try to trick you into installing "360 Total Security".
    Do not select the box [_]Yes, I agree to install 360 Total Security...
    Click the big blue box "Get Started with WPS".

    Start WPS Office and click away the sell-up advertising.
    Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in

    You have to sign in to what in order to convert a PDF to OCR with WPS.
    I guess in the end it's maybe an online converter - but it's hard to tell.
    I didn't create an account so I never was able to find out how it works.

    All I know is it's a Microsoft Office clone that says it does OCR for free.
    Years ago I used and really liked Kingsoft. Then LibreOffice got better and I switched. But
    Kingsoft did a great job (or good) reading/writing MS Word stuff.
    --
    Linux Mint 21.3, Cinnamon 6.0.4, Kernel 5.15.0-113-generic
    Al

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg Walther@21:1/5 to Enrico Papaloma on Mon Jul 15 10:10:05 2024
    XPost: comp.text.pdf

    Enrico Papaloma wrote:

    Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
    Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File) >Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

    It says "ohne OCR". What does "ohne" mean anyway?

    Ohne is German,meaning "without".

    -jw-
    --
    And now for something completely different...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From croy@21:1/5 to All on Mon Jul 15 10:44:09 2024
    On Sun, 14 Jul 2024 16:11:53 -0400, Isaac Montara <IsaacMontara@nospam.com> wrote:

    On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

    I use Irfanveiw for all my image and OCR projects.

    You need Irfanview and the OCR plugin.

    Open the PDF file in Irfanvieiw, high lite the text and activate the
    OCR function.

    Nice! Once you figure it out, Irfanview with the plugin is great!

    I opened a scanned-page bitmap PDF image in Irfanview.
    Irfanview:File > Open > scan.jpg
    Irfanview:Options > Start OCR...(Plugin)
    This opened up the page of bitmap text in yellow highlight at the left.

    All I get is an empty window.

    --
    croy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jim the Geordie@21:1/5 to All on Mon Jul 15 19:16:58 2024
    XPost: comp.text.pdf

    In article <v6v74c$80bq$1@matrix.hispagatos.org>, bill@anarchists.org
    says...

    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Just come over this post.
    Has anyone mentioned ABBYY FineReader?
    I use it all the time.
    Saves to Word and PDF with no problems.

    --
    Jim the Geordie

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Herbert Kleebauer on Mon Jul 15 13:09:41 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:
    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to knuttle on Mon Jul 15 13:11:02 2024
    XPost: comp.text.pdf

    On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:
    I use Irfanveiw for all my image and OCR projects.

    You need Irfanview and the OCR plugin.

    Open the PDF file in Irfanvieiw, high lite the text and activate the
    OCR function.


    I've been using Irfanview for years, but when I tried the OCR plugin
    I found it did a significantly worse job than OneNote.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Joerg Walther on Mon Jul 15 13:19:13 2024
    XPost: comp.text.pdf

    On Mon, 15 Jul 2024 10:10:05 +0200, Joerg Walther wrote:

    Enrico Papaloma wrote:

    Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
    Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File) >Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

    It says "ohne OCR". What does "ohne" mean anyway?

    Ohne is German,meaning "without".


    As in /Die Frau Ohne Schatten/ (The Woman without a Shadow), an
    unjustly neglected opera by Richard Strauss.

    I recognize several of the singers' names in this video, so it ought
    to be a good performance, but I haven't listened to it because I have
    one on CD:

    https://www.youtube.com/watch?v=rFfc_rP9ROk

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?SsO4cmdlbiBOaWVsc2Vu?=@21:1/5 to All on Mon Jul 15 22:49:46 2024
    XPost: comp.text.pdf

    mandag, 15-07-2024, Stan Brown skrev:
    On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:
    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.

    Select Rectangular snip, select the text, double click on Snipping
    Tools, click on text in the menu, select the text and copy.

    --
    Mvh. Jørgen
    [e-mail address is valid]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Stan Brown on Mon Jul 15 23:01:24 2024
    XPost: comp.text.pdf

    On 15.07.2024 22:09, Stan Brown wrote:

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.


    Maybe it is only available in Win11 but not in Win10.
    I have version: Snipping Tool 11.2405.32.0

    https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11

    || Once you've captured a snip, select the Text Actions button to
    || activate the Optical Character Recognition (OCR) feature. This
    || allows you to extract text directly from your image. From here,
    || you have the option to either select and copy specific text, or
    || use the tools to Copy all text or to Quick redact. All text
    || recognition processes are performed locally on your

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From knuttle@21:1/5 to croy on Mon Jul 15 22:30:52 2024
    On 07/15/2024 1:44 PM, croy wrote:
    the page of bitmap text in yellow
    After you have highlighted the text and started the OCR plug in, you
    must start the OCR process the the popup window for the OCR.

    This is different than the earlier OCR plug in that was used by Irfan
    view. In the older version, the text you highlighted, was brought to
    the OCR window. They you highlighted it again to start the OCR process.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Herbert Kleebauer on Tue Jul 16 01:18:40 2024
    XPost: comp.text.pdf

    On 7/15/2024 5:01 PM, Herbert Kleebauer wrote:
    On 15.07.2024 22:09, Stan Brown wrote:

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.


    Maybe it is only available in Win11 but not in Win10.
    I have version: Snipping Tool 11.2405.32.0

    https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11

    || Once you've captured a snip, select the Text Actions button to
    || activate the Optical Character Recognition (OCR) feature. This
    || allows you to extract text directly from your image. From here,
    || you have the option to either select and copy specific text, or
    || use the tools to Copy all text or to Quick redact. All text
    || recognition processes are performed locally on your

    This is what I'm seeing.

    [Picture]

    https://i.postimg.cc/BnZCqsSV/snippingtool-OCR-is-implicit.gif

    You select "text actions" first.

    The OCR conversion happens upon entry to the function,
    with no request on your part.

    The "Copy as Text" is presumably supposed to trigger "OCR was done"
    in your brain ??? A violation of discover-ability. Or of some other
    principle they might have taught in CS school.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Paul on Tue Jul 16 08:43:11 2024
    XPost: comp.text.pdf

    On 16.07.2024 07:18, Paul wrote:

    The "Copy as Text" is presumably supposed to trigger "OCR was done"
    in your brain ??? A violation of discover-ability. Or of some other
    principle they might have taught in CS school.

    I think it is a good idea to replace the keyboard sequence CTRL-A CTRL-C
    by a simple mouse click. And there is also the button to remove email
    addresses and phone numbers from t

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Herbert Kleebauer on Thu Jul 18 15:10:16 2024
    XPost: comp.text.pdf

    On Mon, 15 Jul 2024 23:01:24 +0200, Herbert Kleebauer wrote:
    On 15.07.2024 22:09, Stan Brown wrote:

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.

    I did mot write the above paragraph.

    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.


    Maybe it is only available in Win11 but not in Win10.
    I have version: Snipping Tool 11.2405.32.0

    Oh, silly me. We're in a Windows 10 newsgroup, so I thought we were
    talking about a Windows 10 feature.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to All on Thu Jul 18 15:06:25 2024
    XPost: comp.text.pdf

    On Mon, 15 Jul 2024 22:49:46 +0200, Jørgen Nielsen wrote:

    mandag, 15-07-2024, Stan Brown skrev:
    On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:
    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.

    Select Rectangular snip, select the text, double click on Snipping
    Tools, click on text in the menu, select the text and copy.

    As soon as I begin selecting text, the Sniping Tools icon menu at
    the top of the screen disappears, so there's nothing to double-click
    on.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wasbit@21:1/5 to knuttle on Fri Jul 19 10:05:59 2024
    XPost: comp.text.pdf

    On 14/07/2024 11:54, knuttle wrote:
    On 07/14/2024 3:25 AM, Herbert Kleebauer wrote:
    On 14.07.2024 02:46, Bill Powell wrote:

    I have a series of one-page PDFs that are really images and not text
    even
    though they look like they're just a page of simple text in the same
    font.

    Is there a way to easily OCR a PDF to actual text on Windows for free?

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.

    Or you can use Firefox to display the pdf and and use an OCR
    plug-in.

    I use Irfanveiw for all my image and OCR projects.

    You need Irfanview and the OCR plugin.

    Open the PDF file in  Irfanvieiw, high lite the text and activate the
    OCR function.

    I recently had to sort out an XP machine with some 500 wrongly named & corrupted files that contained photos.
    I was pleasantly surprised at the number of different types of file that Irfanview would open, play & sort out the correct extension. Save me
    hundreds of clicks & hours of work.

    --
    Regards
    wasbit

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wasbit@21:1/5 to Herbert Kleebauer on Fri Jul 19 10:13:56 2024
    XPost: comp.text.pdf

    On 15/07/2024 22:01, Herbert Kleebauer wrote:
    On 15.07.2024 22:09, Stan Brown wrote:

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.


    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.


    Maybe it is only available in Win11 but not in Win10.
    I have version: Snipping Tool 11.2405.32.0

    https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11



    FYI
    The snipping tool is available in Windows 8.1.
    A better name would be Screenshot tool. I use it on a regular basis.



    --
    Regards
    wasbit

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve Hayes@21:1/5 to wasbit on Fri Jul 19 11:35:06 2024
    XPost: comp.text.pdf

    On Fri, 19 Jul 2024 10:05:59 +0100, wasbit <wasbit@nowhere.com> wrote:

    I recently had to sort out an XP machine with some 500 wrongly named & >corrupted files that contained photos.
    I was pleasantly surprised at the number of different types of file that >Irfanview would open, play & sort out the correct extension. Save me
    hundreds of clicks & hours of work.

    I find Irfanview very useful for all kinds of graphics tasks.


    --
    Steve Hayes from Tshwane, South Africa
    Web: http://www.khanya.org.za/stevesig.htm
    Blog: http://khanya.wordpress.com
    E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Stan Brown on Fri Jul 19 11:17:54 2024
    XPost: comp.text.pdf

    On 7/18/2024 6:10 PM, Stan Brown wrote:
    On Mon, 15 Jul 2024 23:01:24 +0200, Herbert Kleebauer wrote:
    On 15.07.2024 22:09, Stan Brown wrote:

    For only a few lines of text you can use the Snipping Tool: press
    <WIN><SHIFT>S and select the part of the screen with the text.
    When the Snipping Tool opens, select the OCR function.

    I did mot write the above paragraph.

    What OCR function? I just get a menu at the top of the screen
    consisting of five icons: Rectangular snip, Freeform snip, Window
    snip, Fullscreen snip, Close snipping.


    Maybe it is only available in Win11 but not in Win10.
    I have version: Snipping Tool 11.2405.32.0

    Oh, silly me. We're in a Windows 10 newsgroup, so I thought we were
    talking about a Windows 10 feature.


    Windows 10 has two programs.

    SnippingTool.exe is a win32 program, with a WinAmp-tiny interface and no features.
    You would not expect to find any functions "sandwiched" into that.

    But they also have "Snip and Sketch" Metro.App, with decorations suspiciously similar to the Windows 11 "SnippingTool" Metro.App . Snip and Sketch is likely the fast prototype version of the SnippingTool that ships on Windows 11.

    Apparently, for a short time, a Text Actions was exposed on Win10 "Snip and Sketch",
    but only for A/B testing (only a percentage of users would see it, and perhaps with no warning either), and presumably completely removed again afterwards.

    Search engines are pretty useless for tracking stuff like this. Using relatively neutral keywords, as an example, I got one "result" on one page,
    for one of my queries, almost like the topic was "verboten".

    *******

    One thing that is of minor interest, is OCR is part of .NET .

    https://learn.microsoft.com/en-us/samples/microsoft/windows-universal-samples/ocr/

    Without some sort of development history ("where did it come from"),
    I doubt a lot of developers would invest time quantifying it
    for suitability in a product. All the OCR things I've ever tested,
    have sucked, so my going-in assumption when a new one shows up,
    is it will be more of the same.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrew@21:1/5 to Steve Hayes on Fri Jul 19 15:52:32 2024
    XPost: comp.text.pdf

    Steve Hayes wrote on Fri, 19 Jul 2024 11:35:06 +0200 :

    I recently had to sort out an XP machine with some 500 wrongly named & >>corrupted files that contained photos.
    I was pleasantly surprised at the number of different types of file that >>Irfanview would open, play & sort out the correct extension. Save me >>hundreds of clicks & hours of work.

    I find Irfanview very useful for all kinds of graphics tasks.

    I love that the Irfanview batch command can modify a set of images to
    obfuscate fingerprinting (which is important as I upload many images).

    This image fingerprinting only gets better by the day where it's already capable of connecting two disparate images on the net to the exact camera.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Flynn@21:1/5 to micky on Wed Jul 24 21:18:33 2024
    XPost: comp.text.pdf

    On 14/07/2024 02:57, micky wrote:
    In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
    Powell <bill@anarchists.org> wrote:

    I have a series of one-page PDFs that are really images and not text even
    though they look like they're just a page of simple text in the same font. >>
    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Aren't there lots of websites that do this, but you have to upload the
    file. I've resisted that but would be really happpy if I could do it
    inside my computer.

    Is tesseract not available on Windows?

    P

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Peter Flynn on Wed Jul 24 19:29:31 2024
    XPost: comp.text.pdf

    On 7/24/2024 4:18 PM, Peter Flynn wrote:
    On 14/07/2024 02:57, micky wrote:
    In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
    Powell <bill@anarchists.org> wrote:

    I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
    Is there a way to easily OCR a PDF to actual text on Windows for free?

    Aren't there lots of websites that do this, but you have to upload the
    file.  I've resisted that but would be really happpy if I could do it
    inside my computer.

    Is tesseract not available on Windows?

    P

    https://github.com/UB-Mannheim/tesseract/wiki

    https://github.com/UB-Mannheim/tesseract/releases/download/v5.4.0.20240606/tesseract-ocr-w64-setup-5.4.0.20240606.exe

    https://github.com/UB-Mannheim/tesseract/wiki/Install-additional-language-and-script-models

    https://tesseract-ocr.github.io/tessdoc/Data-Files

    The english file (training data), as an example, is 14.7MB.

    *******
    tesseract-ocr-w64-setup-5.4.0.20240606.exe 50,175,248 bytes

    https://www.virustotal.com/gui/file/c885fff6998e0608ba4bb8ab51436e1c6775c2bafc2559a19b423e18678b60c9

    Haven't tested that.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)