Forum: >>> Magnum BBS <<<

OCR on Windows

From Bill Powell@21:1/5 to All on Sun Jul 14 02:46:04 2024

XPost: comp.text.pdf

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From micky@21:1/5 to Powell on Sat Jul 13 21:57:19 2024

XPost: comp.text.pdf

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
Powell <bill@anarchists.org> wrote:

I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

Aren't there lots of websites that do this, but you have to upload the
file. I've resisted that but would be really happpy if I could do it
inside my computer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Bill Powell on Sat Jul 13 22:22:11 2024

XPost: comp.text.pdf

On 7/13/2024 8:46 PM, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From micky@21:1/5 to newyana@invalid.nospam on Sat Jul 13 22:52:52 2024

XPost: comp.text.pdf

In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 <newyana@invalid.nospam> wrote:

On 7/13/2024 8:46 PM, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even
though they look like they're just a page of simple text in the same font. >>
Is there a way to easily OCR a PDF to actual text on Windows for free?

I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

http://www.freeocr.net/
http://www.paperfile.net/ https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Charlie@21:1/5 to All on Sat Jul 13 21:08:37 2024

XPost: comp.text.pdf

On this Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 wrote:

Is there a way to easily OCR a PDF to actual text on Windows for free?

I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

I too use FreeOCR, which I find is more accurate than others I've tested.
Mine is Free OCR version 5.41 from long ago, September 2015.
There may be a new version, but here is my log file from those days.

FreeOCR http://www.paperfile.net/ (note it's not a secure web site) http://www.paperfile.net/download.html
http://www.paperfile.net/freeocr541.exe
Name: freeocr541.exe
Size: 11316239 bytes (10 MiB)
SHA256: 0BF9D979C7BC3774FC6AE39DF31AFC89BFD9AF60120FC2D1BE50B1B35E850D64

The stone-age installer doesn't even ask where to go on your filesystem.
Worse, it doesn't even go into Program Files but on the C: top level. C:\FreeOCR\FreeOCR.exe
But you can move it to wherever you put your programs on your file system.
It even works in the D drive (but you can't pin a shortcut to the taskbar).

It's pretty easy to use.
Once FreeOCR opens up, press the "Open PDF" icon.
Then press the "OCR" icon.

Then in the right window will be the OCR results, which are accurate.
Then you copy those OCR text results into your Windows clipboard.

From there you paste into your editor of choice.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Powell@21:1/5 to micky on Sun Jul 14 05:02:26 2024

XPost: comp.text.pdf

On Sat, 13 Jul 2024 21:57:19 -0400, micky wrote:

Is there a way to easily OCR a PDF to actual text on Windows for free?

Aren't there lots of websites that do this, but you have to upload the
file. I've resisted that but would be really happpy if I could do it
inside my computer.

These are scanned medical records.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From cable_shill@comcast.net@21:1/5 to All on Sat Jul 13 21:06:54 2024

XPost: comp.text.pdf

Windows Power Toys - Text extractor.

On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
wrote:

I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul in Houston TX@21:1/5 to All on Sat Jul 13 23:23:55 2024

XPost: comp.text.pdf

Newyana2 wrote:

On 7/13/2024 8:46 PM, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even
though they look like they're just a page of simple text in the same
font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

� I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

+1

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Bill Powell on Sat Jul 13 22:45:38 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

OPTION A (if you have OneNote, which is part of MS Office):

1. Paste the image into OneNote.
2. Right-click into the pasted image and select "Copy text from
picture".
3. In your favorite text editor, press Ctrl+V to paste the text.
4. Proofread and make any needed corrections.

I have Office 2010, not Office 365, but I believe OneNote is included
in Office 365.

OPTION B:

Which PDF reader are you using? PDF-Xchange (free) has a menu
selection to perform OCR, putting the text as an extra layer in the
PDF. You can then copy the text from the PDF and paste it into your
editor.

And I'm sure there are other free PDF viewers that have OCR
capability, though PDF-Xchange is the only one I use.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to cable_shill@comcast.net on Sat Jul 13 22:58:17 2024

XPost: comp.text.pdf

On Sat, 13 Jul 2024 21:06:54 -0700, cable_shill@comcast.net wrote:

On Sun, 14 Jul 2024 02:46:04 +0200, Bill Powell <bill@anarchists.org>
wrote:

I have a series of one-page PDFs that are really images and not text even >though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

Windows Power Toys - Text extractor.

You forgot to give the URL: https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

That one says it's "based on Joe Finney's TextGrab", and links to https://github.com/TheJoeFin/Text-Grab

Has anyone tried both, and can speak to whether one does a better job
of text extraction than the other?

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Barnett@21:1/5 to micky on Sun Jul 14 00:35:44 2024

XPost: comp.text.pdf

On 7/13/2024 8:52 PM, micky wrote:

In alt.comp.os.windows-10, on Sat, 13 Jul 2024 22:22:11 -0400, Newyana2 <newyana@invalid.nospam> wrote:

On 7/13/2024 8:46 PM, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
Is there a way to easily OCR a PDF to actual text on Windows for free?

I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

http://www.freeocr.net/

Several pointers embedded at the URL above elicit "blacklisted site"
messages from AVG.

http://www.paperfile.net/ https://www.google.com/search?client=firefox-b-1-d&q=FreeOCR

--
Jeff Barnett

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Herbert Kleebauer@21:1/5 to Bill Powell on Sun Jul 14 09:25:09 2024

XPost: comp.text.pdf

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

Or you can use Firefox to display the pdf and and use an OCR
plug-in.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From knuttle@21:1/5 to Herbert Kleebauer on Sun Jul 14 06:54:16 2024

XPost: comp.text.pdf

On 07/14/2024 3:25 AM, Herbert Kleebauer wrote:

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even
though they look like they're just a page of simple text in the same
font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press <WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

Or you can use Firefox to display the pdf and and use an OCR
plug-in.

I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Jeff Barnett on Sun Jul 14 08:45:02 2024

XPost: comp.text.pdf

On 7/14/2024 2:35 AM, Jeff Barnett wrote:

Several pointers embedded at the URL above elicit "blacklisted site"
messages from AVG.

I should have posted the URL. freeocr.net is just a listing site. paperfile.net is the host of FreeOCR.

I researched this awhile back. I'd been using something that I'd got
from a magazine CD in the late 90s and it actually worked pretty well. Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
served me well.)

But I decided to look around for something more up-to-date because
I sometimes want to convert things like photo-PDFs to plain text.

FreeOCR seems to be simple, quick and no-nonsense. It saves the step
of having to extract images from PDFs. The only down
side is that it came out in early Win10 days and it has a kiddie interface
with a silly fading window at close, with no option to change that.
However... it might be Fischer-Price, but it works. :)

There's an explanation at the site. If I remember correctly, the system
it uses is OSS and while there are newer versions, I didn't find anything
else that was all put together. What I mean is that you can find more recent updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
but it's OSS that's hard to find as finished software.

The program seems to be a fairly simple .Net wrapper around a compiled
EXE version of Tesseract, but it's well designed, making Tesseract usable
and convenient.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Stan Brown on Sun Jul 14 09:04:33 2024

XPost: comp.text.pdf

On 7/14/2024 1:45 AM, Stan Brown wrote:

And I'm sure there are other free PDF viewers that have OCR
capability, though PDF-Xchange is the only one I use.

I also use PDFXV free and love it. I had to get a new version
for Win10. Build 322.10. Lucky it was stil available free. My older
version on XP didn't work right on 10.

PDFXV is quick, does search well, allows me to edit PDFs by
extracting pages as images and pasting them in that way...
I've done my taxes that way -- both fillable forms and non-fillable.
And the whole thing is about 25 MB.

I think Adobe's monstrosity
Reader is something like 300+ MB these days. I went to take a
look, but their version has become even more creepy than before.
First, Adobe wouldn't load a webpage without script, which I didn't
want to enable. Then I found through Major Geeks that the current
version is ad-supported. So I'm guessing they want people to sign
up so they can target the ads... Just when I thought Adobe couldn't
get any more creepy.

I'd never noticed the OCR function in PDFXV. It's not very intuitive,
but it seems to work. I finally figured out that I needed to pick the
selection tool, select all, then copy, to get the converted text.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From micky@21:1/5 to newyana@invalid.nospam on Sun Jul 14 10:09:26 2024

XPost: comp.text.pdf

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 08:45:02 -0400, Newyana2 <newyana@invalid.nospam> wrote:

On 7/14/2024 2:35 AM, Jeff Barnett wrote:

Several pointers embedded at the URL above elicit "blacklisted site"
messages from AVG.

I should have posted the URL. freeocr.net is just a listing site.
paperfile.net is the host of FreeOCR.

And it doesn't mention win10 or 11. I can assume you've been using it
with one of those two.

I thought of just installing it to see if it works, but who knows, maybe installing old, no longer compaitble software could mess up my OS??

I researched this awhile back. I'd been using something that I'd got
from a magazine CD in the late 90s and it actually worked pretty well. >Textbridge Pro. (Along with Lotus WordPro 95. Those magazine CDs
served me well.)

But I decided to look around for something more up-to-date because
I sometimes want to convert things like photo-PDFs to plain text.

FreeOCR seems to be simple, quick and no-nonsense. It saves the step
of having to extract images from PDFs. The only down
side is that it came out in early Win10 days and it has a kiddie interface >with a silly fading window at close, with no option to change that. >However... it might be Fischer-Price, but it works. :)

There's an explanation at the site. If I remember correctly, the system
it uses is OSS and while there are newer versions, I didn't find anything >else that was all put together. What I mean is that you can find more recent >updates of the Tesseract OCR code, https://github.com/tesseract-ocr,
but it's OSS that's hard to find as finished software.

The program seems to be a fairly simple .Net wrapper around a compiled
EXE version of Tesseract, but it's well designed, making Tesseract usable
and convenient.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Isaac Montara@21:1/5 to knuttle on Sun Jul 14 16:11:53 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

Nice! Once you figure it out, Irfanview with the plugin is great!

I opened a scanned-page bitmap PDF image in Irfanview.
Irfanview:File > Open > scan.jpg
Irfanview:Options > Start OCR...(Plugin)
This opened up the page of bitmap text in yellow highlight at the left.
At the right of the full-size display was a bunch of buttons.
None of them was a copy command.

The plugin appears to be a KADMOS Recognition Engine, version 4.4y but all
I want is a way to copy the highlighted text inside the bitmap image.

The text is yellow. But you can't copy it to your clipboard. Or save it.

It took a good couple of minutes of futzing around before I realized what
you have to do is use your left mouse button as if you're going to crop something and choose a box from top left of the text to top right.

The instant you "crop" out that text, you get a "KADMOS recognition
results" window popping up, with the OCR results in now-selectable text.

The results looked accurate in the one test I just gave it just now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Enrico Papaloma@21:1/5 to Stan Brown on Sun Jul 14 21:57:02 2024

XPost: comp.text.pdf

On 7/14/2024 7:45 AM, Stan Brown wrote:

And I'm sure there are other free PDF viewers that have OCR
capability, though PDF-Xchange is the only one I use.

Which of these three files is the one with the OCR? https://pdf-xchange.eu/DL/pdf-xchange-editor.htm

Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

It says "ohne OCR". What does "ohne" mean anyway?
Also, it says it puts a watermark in all files - does it do that for OCR?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From david@21:1/5 to All on Sun Jul 14 14:01:01 2024

XPost: comp.text.pdf

Using <news:v70icj$5c3b$1@dont-email.me>, Newyana2 wrote:

I also use PDFXV free and love it. I had to get a new version
for Win10. Build 322.10. Lucky it was stil available free. My older
version on XP didn't work right on 10.

I can't find any download for PDFXV. https://www.google.com/search?q=windows+%2Bpdfxv+download

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nick Cine@21:1/5 to Paul in Houston TX on Sun Jul 14 14:26:47 2024

XPost: comp.text.pdf

On Sat, 13 Jul 2024 23:23:55 -0500, Paul in Houston TX wrote:

� I have a program called FreeOCR that will do it without having to scan
or extract the pages. Quality depends on fonts, words, etc, but general
it comes out well.

+1

There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there. https://jocr.sourceforge.net/
There's no mention it uses the modern Tesseract scan engine though.
Which may be why it makes so many errors that it's not really useful.

What you want is to invoke the Tessseract scan engine directly somehow.

There is a way to invoke the Tesseract scan engine directly, but I don't
know how to do it. Much like most of the youtube downloading GUIs run the yt-dlp command-line tool under the covers, most of the OCRs tools run the command line for Tesseract under the sheets.

The question then would be how to run the Tesseract OCR engine directly?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bill Powell@21:1/5 to Nick Cine on Sun Jul 14 22:37:25 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 14:26:47 -0600, Nick Cine wrote:

There is a GNU OCR engine called "GOCR" (or sometimes JOCR) out there. https://jocr.sourceforge.net/
There's no mention it uses the modern Tesseract scan engine though.

I had tried the GNU OCR command line before opening the thread.
http://www-e.uni-magdeburg.de/jschulen/ocr/gocr049.exe
Name: gocr049.exe
Size: 153600 bytes (150 KiB)
SHA256: 1FFC4CD29A5B275F40FBC5F6F9194ED72B8D2BCCBD46019F088C9E5DE2923F59

It makes so many spelling errors that it would be easier to type the text
out by hand - which is why I opened this thread to find an OCR that worked.

Looking up the hints you gave me, I think there are many potential Linux,
Mac, Windows, Android & iOS OCR scanning candidates in this github table.
https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html

What is a bit disconcertingly strange is that of all the tools mentioned so
far in this thread, none of them show up in that table and yet that table
has dozens of tools that do OCR so I'm not sure why none of the mentioned
tools showed up.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wolf Greenblatt@21:1/5 to micky on Sun Jul 14 16:50:37 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 10:09:26 -0400, micky wrote:

I should have posted the URL. freeocr.net is just a listing site. >>paperfile.net is the host of FreeOCR.

And it doesn't mention win10 or 11. I can assume you've been using it
with one of those two.

I thought of just installing it to see if it works, but who knows, maybe installing old, no longer compaitble software could mess up my OS??

There's something called Simple OCR https://www.simpleocr.com/download/
which says it's free but I've never tried it so I can't vouch for it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jan K.@21:1/5 to All on Sun Jul 14 22:44:51 2024

XPost: comp.text.pdf

W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:

Windows Power Toys - Text extractor.

You forgot to give the URL: https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

That one says it's "based on Joe Finney's TextGrab", and links to https://github.com/TheJoeFin/Text-Grab

Has anyone tried both, and can speak to whether one does a better job
of text extraction than the other?

I've tried something similar to Microsoft Office for OCR on Windows.
What I tried was a MS Office clone called WPS Office, which I found here. https://www.wps.com/office/pdf/

The company appears to be "Kingsoft" and their webstubb installer is here. https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe

Name: wps_lid.lid-u8MZl7zT7a0C.exe
Size: 5864848 bytes (5727 KiB)
SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B

Since that looks scary, I ran it by a virustotal which cleared it clean. https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b

You have to be careful as it will change your PDF defaults.
Select "Custom Settings" (not "Install Now").
Change from:
[x] Use WPS Office to open pdf files by default
[x] Use WPS Office as the default program for documents
[x] Use WPS Photos to open JPG, PNG, and other image formats by default

Change to:
[_] Use WPS Office to open pdf files by default
[_] Use WPS Office as the default program for documents
[_] Use WPS Photos to open JPG, PNG, and other image formats by default

Then hit the big blue "Install Now" button.
It will say "Downloading WPS Office" so you know it was just a stub.

It will create a wps_download directory containing:
Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe Size: 244193632 bytes (232 MiB)
SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525

After taking forever downloading & installing WPS Office,
WPS Office will try to trick you into installing "360 Total Security".
Do not select the box [_]Yes, I agree to install 360 Total Security...
Click the big blue box "Get Started with WPS".

Start WPS Office and click away the sell-up advertising.
Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in

You have to sign in to what in order to convert a PDF to OCR with WPS.
I guess in the end it's maybe an online converter - but it's hard to tell.
I didn't create an account so I never was able to find out how it works.

All I know is it's a Microsoft Office clone that says it does OCR for free.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Big Al@21:1/5 to Jan K. on Sun Jul 14 16:54:22 2024

XPost: comp.text.pdf

On 7/14/24 04:44 PM, Jan K. wrote:

W Sat, 13 Jul 2024 22:58:17 -0700, Stan Brown napisal:

Windows Power Toys - Text extractor.

You forgot to give the URL:
https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

That one says it's "based on Joe Finney's TextGrab", and links to
https://github.com/TheJoeFin/Text-Grab

Has anyone tried both, and can speak to whether one does a better job of text extraction than the
other?

I've tried something similar to Microsoft Office for OCR on Windows.
What I tried was a MS Office clone called WPS Office, which I found here. https://www.wps.com/office/pdf/

The company appears to be "Kingsoft" and their webstubb installer is here. https://wdl1.pcfg.cache.wpscdn.com/wpsdl/wpsoffice/onlinesetup/distsrc/600.1022/wpsinst/wps_office_inst.exe

Name: wps_lid.lid-u8MZl7zT7a0C.exe
Size: 5864848 bytes (5727 KiB)
SHA256: 81E09F93F6B1C7F9488D912CFD82560D978262CB75ECF7B7953403A8A706259B

Since that looks scary, I ran it by a virustotal which cleared it clean. https://www.virustotal.com/gui/file/81e09f93f6b1c7f9488d912cfd82560d978262cb75ecf7b7953403a8a706259b

You have to be careful as it will change your PDF defaults.
Select "Custom Settings" (not "Install Now").
Change from:
[x] Use WPS Office to open pdf files by default
[x] Use WPS Office as the default program for documents
[x] Use WPS Photos to open JPG, PNG, and other image formats by default

Change to:
[_] Use WPS Office to open pdf files by default
[_] Use WPS Office as the default program for documents
[_] Use WPS Photos to open JPG, PNG, and other image formats by default

Then hit the big blue "Install Now" button.
It will say "Downloading WPS Office" so you know it was just a stub.

It will create a wps_download directory containing:
Name: 132ca6c802422ed94a59d10cbcc9f47b-15_setup_XA_mui_Free.exe.600.1022.exe Size: 244193632 bytes (232 MiB)
SHA256: B6B462DCDA4578D716E207D9747D391597110EC8F4A22C9AC29417E68A86A525

After taking forever downloading & installing WPS Office,
WPS Office will try to trick you into installing "360 Total Security".
Do not select the box [_]Yes, I agree to install 360 Total Security...
Click the big blue box "Get Started with WPS".

Start WPS Office and click away the sell-up advertising.
Tools > PDF OCR > Select File > filename.pdf > Perform OCR > Sign in

You have to sign in to what in order to convert a PDF to OCR with WPS.
I guess in the end it's maybe an online converter - but it's hard to tell.
I didn't create an account so I never was able to find out how it works.

All I know is it's a Microsoft Office clone that says it does OCR for free.

Years ago I used and really liked Kingsoft. Then LibreOffice got better and I switched. But
Kingsoft did a great job (or good) reading/writing MS Word stuff.
--
Linux Mint 21.3, Cinnamon 6.0.4, Kernel 5.15.0-113-generic
Al

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joerg Walther@21:1/5 to Enrico Papaloma on Mon Jul 15 10:10:05 2024

XPost: comp.text.pdf

Enrico Papaloma wrote:

Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File) >Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

It says "ohne OCR". What does "ohne" mean anyway?

Ohne is German,meaning "without".

-jw-
--
And now for something completely different...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From croy@21:1/5 to All on Mon Jul 15 10:44:09 2024

On Sun, 14 Jul 2024 16:11:53 -0400, Isaac Montara <IsaacMontara@nospam.com> wrote:

On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

Nice! Once you figure it out, Irfanview with the plugin is great!

I opened a scanned-page bitmap PDF image in Irfanview.
Irfanview:File > Open > scan.jpg
Irfanview:Options > Start OCR...(Plugin)
This opened up the page of bitmap text in yellow highlight at the left.

All I get is an empty window.

--
croy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jim the Geordie@21:1/5 to All on Mon Jul 15 19:16:58 2024

XPost: comp.text.pdf

In article <v6v74c$80bq$1@matrix.hispagatos.org>, bill@anarchists.org
says...

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

Just come over this post.
Has anyone mentioned ABBYY FineReader?
I use it all the time.
Saves to Word and PDF with no problems.

--
Jim the Geordie

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Herbert Kleebauer on Mon Jul 15 13:09:41 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press <WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to knuttle on Mon Jul 15 13:11:02 2024

XPost: comp.text.pdf

On Sun, 14 Jul 2024 06:54:16 -0400, knuttle wrote:

I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

I've been using Irfanview for years, but when I tried the OCR plugin
I found it did a significantly worse job than OneNote.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Joerg Walther on Mon Jul 15 13:19:13 2024

XPost: comp.text.pdf

On Mon, 15 Jul 2024 10:10:05 +0200, Joerg Walther wrote:

Enrico Papaloma wrote:

Download PDF-XChange Editor/Plus (32/64 Bit Version) (as ZIP File)
Download PDF-XChange Editor PORTABLE (32/64 Bit Version) (as ZIP File) >Download PDF-XChange Editor PORTABLE ohne OCR (32/64 Bit Version) (as ZIP File)

It says "ohne OCR". What does "ohne" mean anyway?

Ohne is German,meaning "without".

As in /Die Frau Ohne Schatten/ (The Woman without a Shadow), an
unjustly neglected opera by Richard Strauss.

I recognize several of the singers' names in this video, so it ought
to be a good performance, but I haven't listened to it because I have
one on CD:

https://www.youtube.com/watch?v=rFfc_rP9ROk

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?B?SsO4cmdlbiBOaWVsc2Vu?=@21:1/5 to All on Mon Jul 15 22:49:46 2024

XPost: comp.text.pdf

mandag, 15-07-2024, Stan Brown skrev:

On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Select Rectangular snip, select the text, double click on Snipping
Tools, click on text in the menu, select the text and copy.

--
Mvh. Jørgen
[e-mail address is valid]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Herbert Kleebauer@21:1/5 to Stan Brown on Mon Jul 15 23:01:24 2024

XPost: comp.text.pdf

On 15.07.2024 22:09, Stan Brown wrote:

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Maybe it is only available in Win11 but not in Win10.
I have version: Snipping Tool 11.2405.32.0

https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11

|| Once you've captured a snip, select the Text Actions button to
|| activate the Optical Character Recognition (OCR) feature. This
|| allows you to extract text directly from your image. From here,
|| you have the option to either select and copy specific text, or
|| use the tools to Copy all text or to Quick redact. All text
|| recognition processes are performed locally on your

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From knuttle@21:1/5 to croy on Mon Jul 15 22:30:52 2024

On 07/15/2024 1:44 PM, croy wrote:

the page of bitmap text in yellow

After you have highlighted the text and started the OCR plug in, you
must start the OCR process the the popup window for the OCR.

This is different than the earlier OCR plug in that was used by Irfan
view. In the older version, the text you highlighted, was brought to
the OCR window. They you highlighted it again to start the OCR process.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Herbert Kleebauer on Tue Jul 16 01:18:40 2024

XPost: comp.text.pdf

On 7/15/2024 5:01 PM, Herbert Kleebauer wrote:

On 15.07.2024 22:09, Stan Brown wrote:

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Maybe it is only available in Win11 but not in Win10.
I have version: Snipping Tool 11.2405.32.0

https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11

|| Once you've captured a snip, select the Text Actions button to
|| activate the Optical Character Recognition (OCR) feature. This
|| allows you to extract text directly from your image. From here,
|| you have the option to either select and copy specific text, or
|| use the tools to Copy all text or to Quick redact. All text
|| recognition processes are performed locally on your

This is what I'm seeing.

[Picture]

https://i.postimg.cc/BnZCqsSV/snippingtool-OCR-is-implicit.gif

You select "text actions" first.

The OCR conversion happens upon entry to the function,
with no request on your part.

The "Copy as Text" is presumably supposed to trigger "OCR was done"
in your brain ??? A violation of discover-ability. Or of some other
principle they might have taught in CS school.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Herbert Kleebauer@21:1/5 to Paul on Tue Jul 16 08:43:11 2024

XPost: comp.text.pdf

On 16.07.2024 07:18, Paul wrote:

The "Copy as Text" is presumably supposed to trigger "OCR was done"
in your brain ??? A violation of discover-ability. Or of some other
principle they might have taught in CS school.

I think it is a good idea to replace the keyboard sequence CTRL-A CTRL-C
by a simple mouse click. And there is also the button to remove email
addresses and phone numbers from t

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Herbert Kleebauer on Thu Jul 18 15:10:16 2024

XPost: comp.text.pdf

On Mon, 15 Jul 2024 23:01:24 +0200, Herbert Kleebauer wrote:

On 15.07.2024 22:09, Stan Brown wrote:

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

I did mot write the above paragraph.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Maybe it is only available in Win11 but not in Win10.
I have version: Snipping Tool 11.2405.32.0

Oh, silly me. We're in a Windows 10 newsgroup, so I thought we were
talking about a Windows 10 feature.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to All on Thu Jul 18 15:06:25 2024

XPost: comp.text.pdf

On Mon, 15 Jul 2024 22:49:46 +0200, J�rgen Nielsen wrote:

mandag, 15-07-2024, Stan Brown skrev:

On Sun, 14 Jul 2024 09:25:09 +0200, Herbert Kleebauer wrote:

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Select Rectangular snip, select the text, double click on Snipping
Tools, click on text in the menu, select the text and copy.

As soon as I begin selecting text, the Sniping Tools icon menu at
the top of the screen disappears, so there's nothing to double-click
on.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From wasbit@21:1/5 to knuttle on Fri Jul 19 10:05:59 2024

XPost: comp.text.pdf

On 14/07/2024 11:54, knuttle wrote:

On 07/14/2024 3:25 AM, Herbert Kleebauer wrote:

On 14.07.2024 02:46, Bill Powell wrote:

I have a series of one-page PDFs that are really images and not text
even
though they look like they're just a page of simple text in the same
font.

Is there a way to easily OCR a PDF to actual text on Windows for free?

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

Or you can use Firefox to display the pdf and and use an OCR
plug-in.

I use Irfanveiw for all my image and OCR projects.

You need Irfanview and the OCR plugin.

Open the PDF file in Irfanvieiw, high lite the text and activate the
OCR function.

I recently had to sort out an XP machine with some 500 wrongly named & corrupted files that contained photos.
I was pleasantly surprised at the number of different types of file that Irfanview would open, play & sort out the correct extension. Save me
hundreds of clicks & hours of work.

--
Regards
wasbit

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From wasbit@21:1/5 to Herbert Kleebauer on Fri Jul 19 10:13:56 2024

XPost: comp.text.pdf

On 15/07/2024 22:01, Herbert Kleebauer wrote:

On 15.07.2024 22:09, Stan Brown wrote:

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Maybe it is only available in Win11 but not in Win10.
I have version: Snipping Tool 11.2405.32.0

https://support.microsoft.com/en-us/windows/use-snipping-tool-to-capture-screenshots-00246869-1843-655f-f220-97299b865f6b#ID0EDD=Windows_11

FYI
The snipping tool is available in Windows 8.1.
A better name would be Screenshot tool. I use it on a regular basis.

--
Regards
wasbit

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Steve Hayes@21:1/5 to wasbit on Fri Jul 19 11:35:06 2024

XPost: comp.text.pdf

On Fri, 19 Jul 2024 10:05:59 +0100, wasbit <wasbit@nowhere.com> wrote:

I recently had to sort out an XP machine with some 500 wrongly named & >corrupted files that contained photos.
I was pleasantly surprised at the number of different types of file that >Irfanview would open, play & sort out the correct extension. Save me
hundreds of clicks & hours of work.

I find Irfanview very useful for all kinds of graphics tasks.

--
Steve Hayes from Tshwane, South Africa
Web: http://www.khanya.org.za/stevesig.htm
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Stan Brown on Fri Jul 19 11:17:54 2024

XPost: comp.text.pdf

On 7/18/2024 6:10 PM, Stan Brown wrote:

On Mon, 15 Jul 2024 23:01:24 +0200, Herbert Kleebauer wrote:

On 15.07.2024 22:09, Stan Brown wrote:

For only a few lines of text you can use the Snipping Tool: press
<WIN><SHIFT>S and select the part of the screen with the text.
When the Snipping Tool opens, select the OCR function.

I did mot write the above paragraph.

What OCR function? I just get a menu at the top of the screen
consisting of five icons: Rectangular snip, Freeform snip, Window
snip, Fullscreen snip, Close snipping.

Maybe it is only available in Win11 but not in Win10.
I have version: Snipping Tool 11.2405.32.0

Oh, silly me. We're in a Windows 10 newsgroup, so I thought we were
talking about a Windows 10 feature.

Windows 10 has two programs.

SnippingTool.exe is a win32 program, with a WinAmp-tiny interface and no features.
You would not expect to find any functions "sandwiched" into that.

But they also have "Snip and Sketch" Metro.App, with decorations suspiciously similar to the Windows 11 "SnippingTool" Metro.App . Snip and Sketch is likely the fast prototype version of the SnippingTool that ships on Windows 11.

Apparently, for a short time, a Text Actions was exposed on Win10 "Snip and Sketch",
but only for A/B testing (only a percentage of users would see it, and perhaps with no warning either), and presumably completely removed again afterwards.

Search engines are pretty useless for tracking stuff like this. Using relatively neutral keywords, as an example, I got one "result" on one page,
for one of my queries, almost like the topic was "verboten".

*******

One thing that is of minor interest, is OCR is part of .NET .

https://learn.microsoft.com/en-us/samples/microsoft/windows-universal-samples/ocr/

Without some sort of development history ("where did it come from"),
I doubt a lot of developers would invest time quantifying it
for suitability in a product. All the OCR things I've ever tested,
have sucked, so my going-in assumption when a new one shows up,
is it will be more of the same.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrew@21:1/5 to Steve Hayes on Fri Jul 19 15:52:32 2024

XPost: comp.text.pdf

Steve Hayes wrote on Fri, 19 Jul 2024 11:35:06 +0200 :

I recently had to sort out an XP machine with some 500 wrongly named & >>corrupted files that contained photos.
I was pleasantly surprised at the number of different types of file that >>Irfanview would open, play & sort out the correct extension. Save me >>hundreds of clicks & hours of work.

I find Irfanview very useful for all kinds of graphics tasks.

I love that the Irfanview batch command can modify a set of images to
obfuscate fingerprinting (which is important as I upload many images).

This image fingerprinting only gets better by the day where it's already capable of connecting two disparate images on the net to the exact camera.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Flynn@21:1/5 to micky on Wed Jul 24 21:18:33 2024

XPost: comp.text.pdf

On 14/07/2024 02:57, micky wrote:

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
Powell <bill@anarchists.org> wrote:

I have a series of one-page PDFs that are really images and not text even
though they look like they're just a page of simple text in the same font. >>
Is there a way to easily OCR a PDF to actual text on Windows for free?

Aren't there lots of websites that do this, but you have to upload the
file. I've resisted that but would be really happpy if I could do it
inside my computer.

Is tesseract not available on Windows?

P

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Peter Flynn on Wed Jul 24 19:29:31 2024

XPost: comp.text.pdf

On 7/24/2024 4:18 PM, Peter Flynn wrote:

On 14/07/2024 02:57, micky wrote:

In alt.comp.os.windows-10, on Sun, 14 Jul 2024 02:46:04 +0200, Bill
Powell <bill@anarchists.org> wrote:

I have a series of one-page PDFs that are really images and not text even >>> though they look like they're just a page of simple text in the same font. >>>
Is there a way to easily OCR a PDF to actual text on Windows for free?

Aren't there lots of websites that do this, but you have to upload the
file. I've resisted that but would be really happpy if I could do it
inside my computer.

Is tesseract not available on Windows?

P

https://github.com/UB-Mannheim/tesseract/wiki

https://github.com/UB-Mannheim/tesseract/releases/download/v5.4.0.20240606/tesseract-ocr-w64-setup-5.4.0.20240606.exe

https://github.com/UB-Mannheim/tesseract/wiki/Install-additional-language-and-script-models

https://tesseract-ocr.github.io/tessdoc/Data-Files

The english file (training data), as an example, is 14.7MB.

*******
tesseract-ocr-w64-setup-5.4.0.20240606.exe 50,175,248 bytes

https://www.virustotal.com/gui/file/c885fff6998e0608ba4bb8ab51436e1c6775c2bafc2559a19b423e18678b60c9

Haven't tested that.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	361
Nodes:	16 (2 / 14)
Uptime:	123:26:04
Calls:	7,716
Files:	12,861
Messages:	5,727,955

OCR on Windows

Who's Online

System Info