Forum: >>> Magnum BBS <<<

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text i

From knuttle@21:1/5 to Peter on Thu Jun 15 12:56:41 2023

XPost: rec.photo.digital, comp.text.pdf

On 06/15/2023 12:43 PM, Peter wrote:

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages but it's not worth buying OCR software for just one file.

Is there a way to upload the PDF to the net for others to see what it is?

I know nothing about it, but you may try

https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=EAIaIQobChMIqrzm8N3F_wIVzmxMCh0WdA7aEAAYASAAEgIT1_D_BwE

In the past, I have you the camera function of the Adobe Reader, pasted
the selection into Irfanview, and use the Irfanview Plugin to OCR the information.

https://www.irfanview.info/plugins/kadmos/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter@21:1/5 to knuttle on Thu Jun 15 20:43:45 2023

XPost: rec.photo.digital, comp.text.pdf

knuttle <keith_nuttle@yahoo.com> wrote:

Is there a way to upload the PDF to the net for others to see what it is?

I know nothing about it, but you may try

https://pdf.wondershare.net/ad/pdf-editor/ocr.html

Thank you for that bitmap PDF to OCR suggestion as it would be valuable for anyone on these newsgroups to have a free tool that converts PDF bitmaps
into text using Optical Character Recognition or which can convert a
regular PDF into a Microsoft Office document (which wondershare also does).

After spending about an hour on that tool, my suggestion is that it's not
worth installing unless you're willing to create an account & pay for it.

This is the link I downloaded it from today. https://download.wondershare.net/inst/pdfelement-pro_setup_full5261.exe

This is the installer but it also creates an offline installer file.
Name: pdfelement-pro_setup_full5261.exe
Size: 2119160 bytes (2069 KiB)
SHA256: 394407574DFCDC76744AF69D6FAAB8DCFD4255B2DFDE73234060C6E024CABD77

This is the offline installer file that the installer above created.
Name: pdfelement-pro_64bit_full5261.exe
Size: 156604880 bytes (149 MiB)
SHA256: 61DD463B27792D5EF880A1E5B5C86FB7784A35A9F72D8FECC4DA5B6866A2C956

I deleted the first online installation and tried the offline installer
(with the Ethernet cord unplugged) & it worked the same either way.

Both installers installed Wondershare PDFelement (Version 9.5.1) onto
Windows 10 and both tried to phone home (with your machine ID in it). https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=<longnumber>

When installing it tries to be the default pdf editor and it tries to add things to your context menu (but you can uncheck those checked boxes).

After installing you will likely want to go into preferences to turn off
the "autostart" for the Wondershare screenshot tool & batch tools (whatever they are).

In the preferences you must update but you can change it to every quarter.

The installer put tons of crap in places it just shouldn't be touching. C:\Users\you\AppData\Local\Temp\Wondershare C:\Users\you\AppData\Roaming\Wondershare
HKCU\Software\Wondershare
HKLM\Software\PEPrinter
HKLM\Software\Wondershare
HKLM\Software\Wow6432Node\Wondershare

When you try to OCR a bitmap PDF using "PDFElement > Quick Tools > OCR"
it says it requires a 391.76 MB additional download but it will ask to
do that when you first try to turn a bitmapped PDF into text.

But after that additional OCR download, when you try to OCR a bitmap PDF
it stops you right there telling you that you need an account & to pay.
"Trial version only supports previewing OCR effect."

Same with conversion to Microsoft Word (which I was trying to sneak by).
"Trial version only supports PDF-to-Word of 3 pages."

Both require a Wondershare ID which shouldn't be needed just to save files.

It phones home when you uninstall. https://cbs.wondershare.com/go.php?pid=5261&m=u&product_version=9.5.1&client_sign=<longnumber>

All in all, it looked like a slick application if you wanted to do more
than one or two OCR or PDF-to-MSOFFICE conversions. But I don't.

In the past, I have you the camera function of the Adobe Reader, pasted
the selection into Irfanview, and use the Irfanview Plugin to OCR the information.

https://www.irfanview.info/plugins/kadmos/

I had Irfanview 64 so I uninstalled that & installed Irfanview 32 first. https://www.irfanview.com/ https://www.fosshub.com/IrfanView.html?dwl=iview462_setup.exe
Name: iview462_setup.exe
Size: 3259352 bytes (3182 KiB)
SHA256: 37CDB372C4B6053356ECA2C40AA44F4FB8CD30681C28CDA54E80601D6C7B565A

Then I extracted the 32-bit plugins zip file into the Plugins folder.
Name: iview462_plugins.zip
Size: 16823082 bytes (16 MiB)
SHA256: B85B1220E785F094611EB4BDD9DE17252FA023BB604FDF548CB278878E690780

At first the Irfanview "Options > Start OCR Plugin" was grayed out
but I had to open up a PDF file first to get that menu to ungray itself.

Then an F9 said "Irfanview Can't load Plugin 'OCR_KADMOS.DLL'!
Please install or update Plugins from IrfanView homepage
and/or enable the Plugin in 'Help -> Installed Plugins' menu. http://www.irfanview.info/plugins/kadmos/"

When I went to Irfanview "Help > Installed Plugins", it provided a long alphabetical list of plugins (most of which were checked already) but there
was nothing starting with the letter "K" in that list).

I needed to add the Kadmos plugin which wasn't part of the default package. https://www.irfanview.info/plugins/kadmos/ https://www.irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe
Name: setup_kadmos_irfanview_us.exe
Size: 6630790 bytes (6475 KiB)
SHA256: 82253452ED26CEA5F81CC8E13A0A3EA600B4F607D0CA5F3D1D058D97D403236F

That installer knew where to put itself in the Irfanview32 plugins folder.

Back in Irfanview 32-bit with the Kadmos OCR plugin added, I opened a multi-page PDF and scrolled to a full page of text (since Kadmos tries to
find text inside of images also) & again I ran the "Options > Start OCR
Plugin" menu selection (set to the F9 hotkey) where I set the language to "English (UK)" (it was the only option) and it highlighted in bright yellow
the entire page of text in a fullscreen additional window.

After a few seconds of wondering what to do next I realized I'm supposed to click my mouse button on a start point or sweep out the full page, which I
did and which instantly created another popup of the text ready to save.

That popup contained "KADMOS recognition results" which were only for that
one page of text (there was no way I could find to select the entire book).

The "KADMOS recognition results" editing window allowed edit corrections
(which were needed) & then the "File" menu allowed these choices
Write ASCII text to file
Copy ASCII text to the clipboard
Write UNICODE text to file
Copy UNICODE text to the clipboard

I don't offhand know the difference between ASCII & UNICODE so I saved the ASCII text to a file and opened it up in an editor and it worked fine.

I can't argue that it didn't work well enough, page by page, especially for
a free program and I can't say it wasn't fast enough, nor while it had
errors, was it all that bad in recognizing the text.

And I'm aware you can break a 200 page PDF into 200 separate PDF files.
But is there something else free like that which can OCR a 200 page book?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul in Houston TX@21:1/5 to Peter on Thu Jun 15 19:16:07 2023

XPost: rec.photo.digital, comp.text.pdf

Peter wrote:

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages but it's not worth buying OCR software for just one file.

Is there a way to upload the PDF to the net for others to see what it is?

I like Free OCR. It works reasonably well. Better than the other 10-15
that I have tried.
http://www.paperfile.net/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Peter on Thu Jun 15 19:29:34 2023

XPost: rec.photo.digital, comp.text.pdf

On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages

If it's 200 pages, don't you mean it's 200 images rather than one
image?

But that's a quibble. OneNote, part of the MS Office suite, can OCR
an image, and it does a fairly good job if the image is fairly clear.
Paste the image from clipboard into OneNote, then right-click on it
and select Copy Text from Picture. Then paste the text from clipboard
to whatever program you wish.

If you don't have Office, google for free OCR sites. There are quite
a few, but I've never used one because I use OneNote. Caution: If
what you're OCRing is sensitive, you wouldn't want to upload it to
some possibly sketchy website.

--
Stan Brown, Tehachapi, California, USA https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From mz721@21:1/5 to Peter on Fri Jun 16 18:49:45 2023

XPost: rec.photo.digital, comp.text.pdf

On 16/06/2023 2:43 am, Peter wrote:

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages but it's not worth buying OCR software for just one file.

Is there a way to upload the PDF to the net for others to see what it is?

Yes, and it works extremely well.

Tesseract

I use it via Cygwin. You can, for example, use some utility to convert
your PDF to a series of images (png, ppm...) and then OCR them. I find
it gives pretty good results, but it works best (for me) using simple
scripts on the command line. For example, you can convert to files
page-001.png etc then loop over them with a simple bash script (Cygwin
uses bash as its shell).

There might be other ways to use it. I am not sure it is the easiest
thing to use, but it does a damned good job for a completely free tool.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Stan Brown on Fri Jun 16 06:40:50 2023

XPost: rec.photo.digital, comp.text.pdf

On 6/15/2023 10:29 PM, Stan Brown wrote:

On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
selectable).

It's about 200 pages

If it's 200 pages, don't you mean it's 200 images rather than one
image?

But that's a quibble. OneNote, part of the MS Office suite, can OCR
an image, and it does a fairly good job if the image is fairly clear.
Paste the image from clipboard into OneNote, then right-click on it
and select Copy Text from Picture. Then paste the text from clipboard
to whatever program you wish.

If you don't have Office, google for free OCR sites. There are quite
a few, but I've never used one because I use OneNote. Caution: If
what you're OCRing is sensitive, you wouldn't want to upload it to
some possibly sketchy website.

When you operate a scan-to-PDF scanner, a 200 page stack of
sheets, the scan head collects that as 200 images, and that
becomes 200 "objects" in the 200 page PDF file.

And these can then be extracted with mutool.exe if you want
to examine the images. For example, the 10 page service manual
I downloaded as a PDF, it had ten images in it that mutool.exe
dumped for me.

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Paul on Fri Jun 16 08:01:56 2023

XPost: rec.photo.digital, comp.text.pdf

On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

What do you use to make the OCR overlay?

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Stan Brown on Fri Jun 16 15:03:30 2023

XPost: rec.photo.digital, comp.text.pdf

On 6/16/2023 11:01 AM, Stan Brown wrote:

On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

What do you use to make the OCR overlay?

Since Linux is more likely to have a current Tesseract, I used
Win10 Bash shell and a Ubuntu distro.

apt search ocrmypdf

sudo apt install ocrmypdf

You don't really need to do this step, but for test purposes,
I just wanted to run it on a single page. I fed it the image from
page 8.

mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
for a PNG input to PDF output:

ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

INFO - Input file is not a PDF, checking if it is an image...
INFO - Input file is an image
INFO - Image seems valid. Try converting to PDF...
INFO - Successfully converted to PDF, processing...
Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
INFO - Using Tesseract OpenMP thread limit 3
OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.00 savings: 0.0%

To do the whole document, you'd likely need less than that, as some
metadata is already inside the PDF. Something like this maybe.

ocrmypdf --output-type pdf input.pdf output.pdf

The output from my Page 8 image, made this standalone PDF. The DPI
declaration, helped it pick a weird page size for the output.

image-0044.pdf

Wiping over that gives text to copy.

I didn't do quality analysis, or refine the command to do a better job.

I should be able to feed it the entire 10 page PDF intact, and
have it output a 10 page PDF with text overlay. Again, not tested.

It's normal for these processes, to not be able to overlay text
exactly on top of the bitmap character underneath. The Adobe OCR
in their paid tool, does do an exact job. Many other "hobby projects",
do not.

For a start, I was just happy to see Tesseract not fall over.

The Adobe tool (in the Acrobat editor in their distiller package),
first does layout analysis. On a three-column magazine layout,
it correctly removes the image content from consideration,
then it OCR-processes each column and precisely lays the text on top.

And has been previously described in this thread, if there is even
a bit of font&text in the document already, the OCR does not like that
and it bails. It expects "pristine" cut-sheet scan images to work on
and no fonts declared in the PDF. In the case of Adobe, it also expects
the scan to be done at 200DPI to 400DPI (based on page size declaration
and such). Many times, I was thwarted in Adobe by a "this image needs
to be between 200DPI and 400DPI" type of message. And then it takes
half the day to arrange a strict diet of noodles for the stupid thing :-)

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Paul on Fri Jun 16 23:24:39 2023

XPost: rec.photo.digital, comp.text.pdf

On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:

On 6/16/2023 11:01 AM, Stan Brown wrote:

On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

What do you use to make the OCR overlay?

Since Linux is more likely to have a current Tesseract, I used
Win10 Bash shell and a Ubuntu distro.

Thanks, Paul, for the detailed explanation. One eye-
opener for me was that the Win10 Bash shell can run
actual Linux programs.

apt search ocrmypdf

sudo apt install ocrmypdf

You don't really need to do this step, but for test purposes,
I just wanted to run it on a single page. I fed it the image from
page 8.

mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
for a PNG input to PDF output:

ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

INFO - Input file is not a PDF, checking if it is an image...
INFO - Input file is an image
INFO - Image seems valid. Try converting to PDF...
INFO - Successfully converted to PDF, processing...
Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
INFO - Using Tesseract OpenMP thread limit 3
OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.00 savings: 0.0%

To do the whole document, you'd likely need less than that, as some
metadata is already inside the PDF. Something like this maybe.

ocrmypdf --output-type pdf input.pdf output.pdf

The output from my Page 8 image, made this standalone PDF. The DPI declaration, helped it pick a weird page size for the output.

image-0044.pdf

Wiping over that gives text to copy.

I didn't do quality analysis, or refine the command to do a better job.

I should be able to feed it the entire 10 page PDF intact, and
have it output a 10 page PDF with text overlay. Again, not tested.

It's normal for these processes, to not be able to overlay text
exactly on top of the bitmap character underneath. The Adobe OCR
in their paid tool, does do an exact job. Many other "hobby projects",
do not.

For a start, I was just happy to see Tesseract not fall over.

The Adobe tool (in the Acrobat editor in their distiller package),
first does layout analysis. On a three-column magazine layout,
it correctly removes the image content from consideration,
then it OCR-processes each column and precisely lays the text on top.

And has been previously described in this thread, if there is even
a bit of font&text in the document already, the OCR does not like that
and it bails. It expects "pristine" cut-sheet scan images to work on
and no fonts declared in the PDF. In the case of Adobe, it also expects
the scan to be done at 200DPI to 400DPI (based on page size declaration
and such). Many times, I was thwarted in Adobe by a "this image needs
to be between 200DPI and 400DPI" type of message. And then it takes
half the day to arrange a strict diet of noodles for the stupid thing :-)

Paul

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
Shikata ga nai...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Stan Brown on Sat Jun 17 06:39:27 2023

XPost: rec.photo.digital, comp.text.pdf

On 6/17/2023 2:24 AM, Stan Brown wrote:

Thanks, Paul, for the detailed explanation. One eye-
opener for me was that the Win10 Bash shell can run
actual Linux programs.

apt search ocrmypdf

sudo apt install ocrmypdf

If you're lucky, it can even do graphics. As far as
I know, WSLg was released for Win10. And while one
announcement gave the impression that "mere humans
can install this stuff", it turned out that there were
no improvements at all to installation-usability. There
are still "boxes to tick, hair loss". They gave the impression
that "installing from the Microsoft Store... done", no,
not true by a country mile. You will have to get out
your Ouija board and consult with the spirit world, to
figure out what step you missed.

I run Linux Firefox via Bash shell.

The $HOME directory is inside a .vhdx container, and
would be

/home/username

whereas the current working directory points to C: like this

/mnt/c/Users/username/Downloads

and you can interact with your favorite Windows directory
and that is outside of the Linux container as such.

When you want to work on some Linux setting, it might be in

/home/username/.config

So the Linuxy stuff is stored away from your C: stuff and
you would not find the cache2 Firefox folder mixed in with your
regular Windows.

When you're finished with it, you type "exit" to exit the Bash
shell. Then "wsl --shutdown" and that stops the Linux kernel
and closes the container up.

The Linux program windows that open, are rootless and the technology
in the last hop is something like Terminal Server. The shading
around the edge of the windows is "suspect" and resizing a window
can be annoying to very annoying at times. And that's a measure of
just how many "layers" the graphics have gone through.

[Picture]

https://i.postimg.cc/QdYs5W49/bash-shell-WSLg.gif

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nomen Nescio@21:1/5 to Stan Brown on Sun Jun 18 05:19:56 2023

XPost: rec.photo.digital, comp.text.pdf

In article <MPG.3ef6f1812024514c990141@news.individual.net>
Stan Brown <the_stan_brown@fastmail.fm> wrote:

On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:

On 6/16/2023 11:01 AM, Stan Brown wrote:

On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

When you run "overlay OCR" on that 200 page scanner document,
each page is an OCR run. All the characters in one image are
"recognized", then PDF lines-of-text in a particular font,
are added to the PDF code for that page. Each page is handled
individually.

What do you use to make the OCR overlay?

Since Linux is more likely to have a current Tesseract, I used
Win10 Bash shell and a Ubuntu distro.

Thanks, Paul, for the detailed explanation. One eye-
opener for me was that the Win10 Bash shell can run
actual Linux programs.

apt search ocrmypdf

sudo apt install ocrmypdf

You don't really need to do this step, but for test purposes,
I just wanted to run it on a single page. I fed it the image from
page 8.

mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
for a PNG input to PDF output:

ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

INFO - Input file is not a PDF, checking if it is an image...
INFO - Input file is an image
INFO - Image seems valid. Try converting to PDF...
INFO - Successfully converted to PDF, processing...
Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
INFO - Using Tesseract OpenMP thread limit 3
OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.00 savings: 0.0%

To do the whole document, you'd likely need less than that, as some metadata is already inside the PDF. Something like this maybe.

ocrmypdf --output-type pdf input.pdf output.pdf

The output from my Page 8 image, made this standalone PDF. The DPI declaration, helped it pick a weird page size for the output.

image-0044.pdf

Wiping over that gives text to copy.

I didn't do quality analysis, or refine the command to do a better job.

I should be able to feed it the entire 10 page PDF intact, and
have it output a 10 page PDF with text overlay. Again, not tested.

It's normal for these processes, to not be able to overlay text
exactly on top of the bitmap character underneath. The Adobe OCR
in their paid tool, does do an exact job. Many other "hobby projects",
do not.

For a start, I was just happy to see Tesseract not fall over.

The Adobe tool (in the Acrobat editor in their distiller package),
first does layout analysis. On a three-column magazine layout,
it correctly removes the image content from consideration,
then it OCR-processes each column and precisely lays the text on top.

And has been previously described in this thread, if there is even
a bit of font&text in the document already, the OCR does not like that
and it bails. It expects "pristine" cut-sheet scan images to work on
and no fonts declared in the PDF. In the case of Adobe, it also expects
the scan to be done at 200DPI to 400DPI (based on page size declaration
and such). Many times, I was thwarted in Adobe by a "this image needs
to be between 200DPI and 400DPI" type of message. And then it takes
half the day to arrange a strict diet of noodles for the stupid thing :-)

Paul

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
Shikata ga nai...

All that crap to watch a 'restricted' video?

Phising guys love users like the OP. He keeps them in business.

Stay away from Google and their like. They're smarter than you are.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From WolfFan@21:1/5 to Peter on Sun Jun 18 19:28:38 2023

XPost: rec.photo.digital, comp.text.pdf

On Jun 15, 2023, Peter wrote
(in article <u6ff1v$fdr5$1@dont-email.me>):

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages but it's not worth buying OCR software for just one file.

Is there a way to upload the PDF to the net for others to see what it is?

1. go to your fac site which can give you a free email address

2. get one

3. go to Adobe’s site, look for Acrobat Reader, download the free trial of the full Acrobat

4. use the free email address to sign up

5. fire up Acrobat, do your OCR

6. delete Acrobat and kill the free email address.

Adobe deserves to get thumped. Thump them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From kelown@21:1/5 to All on Sun Jun 25 14:30:06 2023

XPost: rec.photo.digital, comp.text.pdf

Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

It's about 200 pages but it's not worth buying OCR software for just one file.

PDF-XChange Editor Portable (free) https://portableapps.com/apps/office/pdf-xchange-editor-portable

Convert -> OCR Page(s) -> All

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:37:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:30:04 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Sun Apr 28 16:00:25 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (2 / 14)
Uptime:	02:03:55
Calls:	6,666
Calls today:	4
Files:	12,212
Messages:	5,335,600

Re: Is there a Windows program to OCR one PDF which is an IMAGE (text i

Who's Online

Recent Visitors

System Info