• Re: Is there a Windows program to OCR one PDF which is an IMAGE (text i

    From knuttle@21:1/5 to Peter on Thu Jun 15 12:56:41 2023
    XPost: rec.photo.digital, comp.text.pdf

    On 06/15/2023 12:43 PM, Peter wrote:
    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages but it's not worth buying OCR software for just one file.

    Is there a way to upload the PDF to the net for others to see what it is?
    I know nothing about it, but you may try

    https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=EAIaIQobChMIqrzm8N3F_wIVzmxMCh0WdA7aEAAYASAAEgIT1_D_BwE


    In the past, I have you the camera function of the Adobe Reader, pasted
    the selection into Irfanview, and use the Irfanview Plugin to OCR the information.

    https://www.irfanview.info/plugins/kadmos/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter@21:1/5 to knuttle on Thu Jun 15 20:43:45 2023
    XPost: rec.photo.digital, comp.text.pdf

    knuttle <keith_nuttle@yahoo.com> wrote:
    Is there a way to upload the PDF to the net for others to see what it is?
    I know nothing about it, but you may try

    https://pdf.wondershare.net/ad/pdf-editor/ocr.html

    Thank you for that bitmap PDF to OCR suggestion as it would be valuable for anyone on these newsgroups to have a free tool that converts PDF bitmaps
    into text using Optical Character Recognition or which can convert a
    regular PDF into a Microsoft Office document (which wondershare also does).

    After spending about an hour on that tool, my suggestion is that it's not
    worth installing unless you're willing to create an account & pay for it.

    This is the link I downloaded it from today. https://download.wondershare.net/inst/pdfelement-pro_setup_full5261.exe

    This is the installer but it also creates an offline installer file.
    Name: pdfelement-pro_setup_full5261.exe
    Size: 2119160 bytes (2069 KiB)
    SHA256: 394407574DFCDC76744AF69D6FAAB8DCFD4255B2DFDE73234060C6E024CABD77

    This is the offline installer file that the installer above created.
    Name: pdfelement-pro_64bit_full5261.exe
    Size: 156604880 bytes (149 MiB)
    SHA256: 61DD463B27792D5EF880A1E5B5C86FB7784A35A9F72D8FECC4DA5B6866A2C956

    I deleted the first online installation and tried the offline installer
    (with the Ethernet cord unplugged) & it worked the same either way.

    Both installers installed Wondershare PDFelement (Version 9.5.1) onto
    Windows 10 and both tried to phone home (with your machine ID in it). https://pdf.wondershare.net/ad/pdf-editor/ocr.html?gad=1&gclid=<longnumber>

    When installing it tries to be the default pdf editor and it tries to add things to your context menu (but you can uncheck those checked boxes).

    After installing you will likely want to go into preferences to turn off
    the "autostart" for the Wondershare screenshot tool & batch tools (whatever they are).

    In the preferences you must update but you can change it to every quarter.

    The installer put tons of crap in places it just shouldn't be touching. C:\Users\you\AppData\Local\Temp\Wondershare C:\Users\you\AppData\Roaming\Wondershare
    HKCU\Software\Wondershare
    HKLM\Software\PEPrinter
    HKLM\Software\Wondershare
    HKLM\Software\Wow6432Node\Wondershare

    When you try to OCR a bitmap PDF using "PDFElement > Quick Tools > OCR"
    it says it requires a 391.76 MB additional download but it will ask to
    do that when you first try to turn a bitmapped PDF into text.

    But after that additional OCR download, when you try to OCR a bitmap PDF
    it stops you right there telling you that you need an account & to pay.
    "Trial version only supports previewing OCR effect."

    Same with conversion to Microsoft Word (which I was trying to sneak by).
    "Trial version only supports PDF-to-Word of 3 pages."

    Both require a Wondershare ID which shouldn't be needed just to save files.

    It phones home when you uninstall. https://cbs.wondershare.com/go.php?pid=5261&m=u&product_version=9.5.1&client_sign=<longnumber>

    All in all, it looked like a slick application if you wanted to do more
    than one or two OCR or PDF-to-MSOFFICE conversions. But I don't.

    In the past, I have you the camera function of the Adobe Reader, pasted
    the selection into Irfanview, and use the Irfanview Plugin to OCR the information.

    https://www.irfanview.info/plugins/kadmos/

    I had Irfanview 64 so I uninstalled that & installed Irfanview 32 first. https://www.irfanview.com/ https://www.fosshub.com/IrfanView.html?dwl=iview462_setup.exe
    Name: iview462_setup.exe
    Size: 3259352 bytes (3182 KiB)
    SHA256: 37CDB372C4B6053356ECA2C40AA44F4FB8CD30681C28CDA54E80601D6C7B565A

    Then I extracted the 32-bit plugins zip file into the Plugins folder.
    Name: iview462_plugins.zip
    Size: 16823082 bytes (16 MiB)
    SHA256: B85B1220E785F094611EB4BDD9DE17252FA023BB604FDF548CB278878E690780

    At first the Irfanview "Options > Start OCR Plugin" was grayed out
    but I had to open up a PDF file first to get that menu to ungray itself.

    Then an F9 said "Irfanview Can't load Plugin 'OCR_KADMOS.DLL'!
    Please install or update Plugins from IrfanView homepage
    and/or enable the Plugin in 'Help -> Installed Plugins' menu. http://www.irfanview.info/plugins/kadmos/"

    When I went to Irfanview "Help > Installed Plugins", it provided a long alphabetical list of plugins (most of which were checked already) but there
    was nothing starting with the letter "K" in that list).

    I needed to add the Kadmos plugin which wasn't part of the default package. https://www.irfanview.info/plugins/kadmos/ https://www.irfanview.info/plugins/kadmos/setup_kadmos_irfanview_us.exe
    Name: setup_kadmos_irfanview_us.exe
    Size: 6630790 bytes (6475 KiB)
    SHA256: 82253452ED26CEA5F81CC8E13A0A3EA600B4F607D0CA5F3D1D058D97D403236F

    That installer knew where to put itself in the Irfanview32 plugins folder.

    Back in Irfanview 32-bit with the Kadmos OCR plugin added, I opened a multi-page PDF and scrolled to a full page of text (since Kadmos tries to
    find text inside of images also) & again I ran the "Options > Start OCR
    Plugin" menu selection (set to the F9 hotkey) where I set the language to "English (UK)" (it was the only option) and it highlighted in bright yellow
    the entire page of text in a fullscreen additional window.

    After a few seconds of wondering what to do next I realized I'm supposed to click my mouse button on a start point or sweep out the full page, which I
    did and which instantly created another popup of the text ready to save.

    That popup contained "KADMOS recognition results" which were only for that
    one page of text (there was no way I could find to select the entire book).

    The "KADMOS recognition results" editing window allowed edit corrections
    (which were needed) & then the "File" menu allowed these choices
    Write ASCII text to file
    Copy ASCII text to the clipboard
    Write UNICODE text to file
    Copy UNICODE text to the clipboard

    I don't offhand know the difference between ASCII & UNICODE so I saved the ASCII text to a file and opened it up in an editor and it worked fine.

    I can't argue that it didn't work well enough, page by page, especially for
    a free program and I can't say it wasn't fast enough, nor while it had
    errors, was it all that bad in recognizing the text.

    And I'm aware you can break a 200 page PDF into 200 separate PDF files.
    But is there something else free like that which can OCR a 200 page book?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul in Houston TX@21:1/5 to Peter on Thu Jun 15 19:16:07 2023
    XPost: rec.photo.digital, comp.text.pdf

    Peter wrote:
    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages but it's not worth buying OCR software for just one file.

    Is there a way to upload the PDF to the net for others to see what it is?

    I like Free OCR. It works reasonably well. Better than the other 10-15
    that I have tried.
    http://www.paperfile.net/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Peter on Thu Jun 15 19:29:34 2023
    XPost: rec.photo.digital, comp.text.pdf

    On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:
    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages

    If it's 200 pages, don't you mean it's 200 images rather than one
    image?

    But that's a quibble. OneNote, part of the MS Office suite, can OCR
    an image, and it does a fairly good job if the image is fairly clear.
    Paste the image from clipboard into OneNote, then right-click on it
    and select Copy Text from Picture. Then paste the text from clipboard
    to whatever program you wish.

    If you don't have Office, google for free OCR sites. There are quite
    a few, but I've never used one because I use OneNote. Caution: If
    what you're OCRing is sensitive, you wouldn't want to upload it to
    some possibly sketchy website.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mz721@21:1/5 to Peter on Fri Jun 16 18:49:45 2023
    XPost: rec.photo.digital, comp.text.pdf

    On 16/06/2023 2:43 am, Peter wrote:
    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages but it's not worth buying OCR software for just one file.

    Is there a way to upload the PDF to the net for others to see what it is?

    Yes, and it works extremely well.

    Tesseract

    I use it via Cygwin. You can, for example, use some utility to convert
    your PDF to a series of images (png, ppm...) and then OCR them. I find
    it gives pretty good results, but it works best (for me) using simple
    scripts on the command line. For example, you can convert to files
    page-001.png etc then loop over them with a simple bash script (Cygwin
    uses bash as its shell).

    There might be other ways to use it. I am not sure it is the easiest
    thing to use, but it does a damned good job for a completely free tool.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Stan Brown on Fri Jun 16 06:40:50 2023
    XPost: rec.photo.digital, comp.text.pdf

    On 6/15/2023 10:29 PM, Stan Brown wrote:
    On Thu, 15 Jun 2023 17:43:12 +0100, Peter wrote:
    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't
    selectable).

    It's about 200 pages

    If it's 200 pages, don't you mean it's 200 images rather than one
    image?

    But that's a quibble. OneNote, part of the MS Office suite, can OCR
    an image, and it does a fairly good job if the image is fairly clear.
    Paste the image from clipboard into OneNote, then right-click on it
    and select Copy Text from Picture. Then paste the text from clipboard
    to whatever program you wish.

    If you don't have Office, google for free OCR sites. There are quite
    a few, but I've never used one because I use OneNote. Caution: If
    what you're OCRing is sensitive, you wouldn't want to upload it to
    some possibly sketchy website.


    When you operate a scan-to-PDF scanner, a 200 page stack of
    sheets, the scan head collects that as 200 images, and that
    becomes 200 "objects" in the 200 page PDF file.

    And these can then be extracted with mutool.exe if you want
    to examine the images. For example, the 10 page service manual
    I downloaded as a PDF, it had ten images in it that mutool.exe
    dumped for me.

    When you run "overlay OCR" on that 200 page scanner document,
    each page is an OCR run. All the characters in one image are
    "recognized", then PDF lines-of-text in a particular font,
    are added to the PDF code for that page. Each page is handled
    individually.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Paul on Fri Jun 16 08:01:56 2023
    XPost: rec.photo.digital, comp.text.pdf

    On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

    When you run "overlay OCR" on that 200 page scanner document,
    each page is an OCR run. All the characters in one image are
    "recognized", then PDF lines-of-text in a particular font,
    are added to the PDF code for that page. Each page is handled
    individually.

    What do you use to make the OCR overlay?

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Stan Brown on Fri Jun 16 15:03:30 2023
    XPost: rec.photo.digital, comp.text.pdf

    On 6/16/2023 11:01 AM, Stan Brown wrote:
    On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

    When you run "overlay OCR" on that 200 page scanner document,
    each page is an OCR run. All the characters in one image are
    "recognized", then PDF lines-of-text in a particular font,
    are added to the PDF code for that page. Each page is handled
    individually.

    What do you use to make the OCR overlay?


    Since Linux is more likely to have a current Tesseract, I used
    Win10 Bash shell and a Ubuntu distro.

    apt search ocrmypdf

    sudo apt install ocrmypdf

    You don't really need to do this step, but for test purposes,
    I just wanted to run it on a single page. I fed it the image from
    page 8.

    mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

    Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
    for a PNG input to PDF output:

    ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

    INFO - Input file is not a PDF, checking if it is an image...
    INFO - Input file is an image
    INFO - Image seems valid. Try converting to PDF...
    INFO - Successfully converted to PDF, processing...
    Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
    INFO - Using Tesseract OpenMP thread limit 3
    OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
    JPEGs: 0image [00:00, ?image/s]
    JBIG2: 0item [00:00, ?item/s]
    INFO - Optimize ratio: 1.00 savings: 0.0%

    To do the whole document, you'd likely need less than that, as some
    metadata is already inside the PDF. Something like this maybe.

    ocrmypdf --output-type pdf input.pdf output.pdf

    The output from my Page 8 image, made this standalone PDF. The DPI
    declaration, helped it pick a weird page size for the output.

    image-0044.pdf

    Wiping over that gives text to copy.

    I didn't do quality analysis, or refine the command to do a better job.

    I should be able to feed it the entire 10 page PDF intact, and
    have it output a 10 page PDF with text overlay. Again, not tested.

    It's normal for these processes, to not be able to overlay text
    exactly on top of the bitmap character underneath. The Adobe OCR
    in their paid tool, does do an exact job. Many other "hobby projects",
    do not.

    For a start, I was just happy to see Tesseract not fall over.

    The Adobe tool (in the Acrobat editor in their distiller package),
    first does layout analysis. On a three-column magazine layout,
    it correctly removes the image content from consideration,
    then it OCR-processes each column and precisely lays the text on top.

    And has been previously described in this thread, if there is even
    a bit of font&text in the document already, the OCR does not like that
    and it bails. It expects "pristine" cut-sheet scan images to work on
    and no fonts declared in the PDF. In the case of Adobe, it also expects
    the scan to be done at 200DPI to 400DPI (based on page size declaration
    and such). Many times, I was thwarted in Adobe by a "this image needs
    to be between 200DPI and 400DPI" type of message. And then it takes
    half the day to arrange a strict diet of noodles for the stupid thing :-)

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Paul on Fri Jun 16 23:24:39 2023
    XPost: rec.photo.digital, comp.text.pdf

    On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:

    On 6/16/2023 11:01 AM, Stan Brown wrote:
    On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

    When you run "overlay OCR" on that 200 page scanner document,
    each page is an OCR run. All the characters in one image are
    "recognized", then PDF lines-of-text in a particular font,
    are added to the PDF code for that page. Each page is handled
    individually.

    What do you use to make the OCR overlay?


    Since Linux is more likely to have a current Tesseract, I used
    Win10 Bash shell and a Ubuntu distro.

    Thanks, Paul, for the detailed explanation. One eye-
    opener for me was that the Win10 Bash shell can run
    actual Linux programs.

    apt search ocrmypdf

    sudo apt install ocrmypdf

    You don't really need to do this step, but for test purposes,
    I just wanted to run it on a single page. I fed it the image from
    page 8.

    mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

    Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
    for a PNG input to PDF output:

    ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

    INFO - Input file is not a PDF, checking if it is an image...
    INFO - Input file is an image
    INFO - Image seems valid. Try converting to PDF...
    INFO - Successfully converted to PDF, processing...
    Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
    INFO - Using Tesseract OpenMP thread limit 3
    OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
    JPEGs: 0image [00:00, ?image/s]
    JBIG2: 0item [00:00, ?item/s]
    INFO - Optimize ratio: 1.00 savings: 0.0%

    To do the whole document, you'd likely need less than that, as some
    metadata is already inside the PDF. Something like this maybe.

    ocrmypdf --output-type pdf input.pdf output.pdf

    The output from my Page 8 image, made this standalone PDF. The DPI declaration, helped it pick a weird page size for the output.

    image-0044.pdf

    Wiping over that gives text to copy.

    I didn't do quality analysis, or refine the command to do a better job.

    I should be able to feed it the entire 10 page PDF intact, and
    have it output a 10 page PDF with text overlay. Again, not tested.

    It's normal for these processes, to not be able to overlay text
    exactly on top of the bitmap character underneath. The Adobe OCR
    in their paid tool, does do an exact job. Many other "hobby projects",
    do not.

    For a start, I was just happy to see Tesseract not fall over.

    The Adobe tool (in the Acrobat editor in their distiller package),
    first does layout analysis. On a three-column magazine layout,
    it correctly removes the image content from consideration,
    then it OCR-processes each column and precisely lays the text on top.

    And has been previously described in this thread, if there is even
    a bit of font&text in the document already, the OCR does not like that
    and it bails. It expects "pristine" cut-sheet scan images to work on
    and no fonts declared in the PDF. In the case of Adobe, it also expects
    the scan to be done at 200DPI to 400DPI (based on page size declaration
    and such). Many times, I was thwarted in Adobe by a "this image needs
    to be between 200DPI and 400DPI" type of message. And then it takes
    half the day to arrange a strict diet of noodles for the stupid thing :-)

    Paul



    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Stan Brown on Sat Jun 17 06:39:27 2023
    XPost: rec.photo.digital, comp.text.pdf

    On 6/17/2023 2:24 AM, Stan Brown wrote:

    Thanks, Paul, for the detailed explanation. One eye-
    opener for me was that the Win10 Bash shell can run
    actual Linux programs.

    apt search ocrmypdf

    sudo apt install ocrmypdf

    If you're lucky, it can even do graphics. As far as
    I know, WSLg was released for Win10. And while one
    announcement gave the impression that "mere humans
    can install this stuff", it turned out that there were
    no improvements at all to installation-usability. There
    are still "boxes to tick, hair loss". They gave the impression
    that "installing from the Microsoft Store... done", no,
    not true by a country mile. You will have to get out
    your Ouija board and consult with the spirit world, to
    figure out what step you missed.

    I run Linux Firefox via Bash shell.

    The $HOME directory is inside a .vhdx container, and
    would be

    /home/username

    whereas the current working directory points to C: like this

    /mnt/c/Users/username/Downloads

    and you can interact with your favorite Windows directory
    and that is outside of the Linux container as such.

    When you want to work on some Linux setting, it might be in

    /home/username/.config

    So the Linuxy stuff is stored away from your C: stuff and
    you would not find the cache2 Firefox folder mixed in with your
    regular Windows.

    When you're finished with it, you type "exit" to exit the Bash
    shell. Then "wsl --shutdown" and that stops the Linux kernel
    and closes the container up.

    The Linux program windows that open, are rootless and the technology
    in the last hop is something like Terminal Server. The shading
    around the edge of the windows is "suspect" and resizing a window
    can be annoying to very annoying at times. And that's a measure of
    just how many "layers" the graphics have gone through.

    [Picture]

    https://i.postimg.cc/QdYs5W49/bash-shell-WSLg.gif

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nomen Nescio@21:1/5 to Stan Brown on Sun Jun 18 05:19:56 2023
    XPost: rec.photo.digital, comp.text.pdf

    In article <MPG.3ef6f1812024514c990141@news.individual.net>
    Stan Brown <the_stan_brown@fastmail.fm> wrote:

    On Fri, 16 Jun 2023 15:03:30 -0400, Paul wrote:

    On 6/16/2023 11:01 AM, Stan Brown wrote:
    On Fri, 16 Jun 2023 06:40:50 -0400, Paul wrote:

    When you run "overlay OCR" on that 200 page scanner document,
    each page is an OCR run. All the characters in one image are
    "recognized", then PDF lines-of-text in a particular font,
    are added to the PDF code for that page. Each page is handled
    individually.

    What do you use to make the OCR overlay?


    Since Linux is more likely to have a current Tesseract, I used
    Win10 Bash shell and a Ubuntu distro.

    Thanks, Paul, for the detailed explanation. One eye-
    opener for me was that the Win10 Bash shell can run
    actual Linux programs.

    apt search ocrmypdf

    sudo apt install ocrmypdf

    You don't really need to do this step, but for test purposes,
    I just wanted to run it on a single page. I fed it the image from
    page 8.

    mutool extract sony_srs-t1_t1pc_sm.pdf # collect image files for pages

    Then, in Bash shell on Windows, I did (using the installed ocrmypdf)
    for a PNG input to PDF output:

    ocrmypdf -l eng --image-dpi 400 --output-type pdf image-0044.png image-0044.pdf

    INFO - Input file is not a PDF, checking if it is an image...
    INFO - Input file is an image
    INFO - Image seems valid. Try converting to PDF...
    INFO - Successfully converted to PDF, processing...
    Scan: 100% 1/1 [00:00<00:00, 625.83page/s]
    INFO - Using Tesseract OpenMP thread limit 3
    OCR: 100% 1.0/1.0 [00:07<00:00, 7.01s/page]
    JPEGs: 0image [00:00, ?image/s]
    JBIG2: 0item [00:00, ?item/s]
    INFO - Optimize ratio: 1.00 savings: 0.0%

    To do the whole document, you'd likely need less than that, as some metadata is already inside the PDF. Something like this maybe.

    ocrmypdf --output-type pdf input.pdf output.pdf

    The output from my Page 8 image, made this standalone PDF. The DPI declaration, helped it pick a weird page size for the output.

    image-0044.pdf

    Wiping over that gives text to copy.

    I didn't do quality analysis, or refine the command to do a better job.

    I should be able to feed it the entire 10 page PDF intact, and
    have it output a 10 page PDF with text overlay. Again, not tested.

    It's normal for these processes, to not be able to overlay text
    exactly on top of the bitmap character underneath. The Adobe OCR
    in their paid tool, does do an exact job. Many other "hobby projects",
    do not.

    For a start, I was just happy to see Tesseract not fall over.

    The Adobe tool (in the Acrobat editor in their distiller package),
    first does layout analysis. On a three-column magazine layout,
    it correctly removes the image content from consideration,
    then it OCR-processes each column and precisely lays the text on top.

    And has been previously described in this thread, if there is even
    a bit of font&text in the document already, the OCR does not like that
    and it bails. It expects "pristine" cut-sheet scan images to work on
    and no fonts declared in the PDF. In the case of Adobe, it also expects
    the scan to be done at 200DPI to 400DPI (based on page size declaration
    and such). Many times, I was thwarted in Adobe by a "this image needs
    to be between 200DPI and 400DPI" type of message. And then it takes
    half the day to arrange a strict diet of noodles for the stupid thing :-)

    Paul



    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    Shikata ga nai...

    All that crap to watch a 'restricted' video?

    Phising guys love users like the OP. He keeps them in business.

    Stay away from Google and their like. They're smarter than you are.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From WolfFan@21:1/5 to Peter on Sun Jun 18 19:28:38 2023
    XPost: rec.photo.digital, comp.text.pdf

    On Jun 15, 2023, Peter wrote
    (in article <u6ff1v$fdr5$1@dont-email.me>):

    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages but it's not worth buying OCR software for just one file.

    Is there a way to upload the PDF to the net for others to see what it is?

    1. go to your fac site which can give you a free email address

    2. get one

    3. go to Adobe’s site, look for Acrobat Reader, download the free trial of the full Acrobat

    4. use the free email address to sign up

    5. fire up Acrobat, do your OCR

    6. delete Acrobat and kill the free email address.

    Adobe deserves to get thumped. Thump them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From kelown@21:1/5 to All on Sun Jun 25 14:30:06 2023
    XPost: rec.photo.digital, comp.text.pdf

    Is there a Windows program to OCR one PDF which is an IMAGE (text isn't selectable).

    It's about 200 pages but it's not worth buying OCR software for just one file.

    PDF-XChange Editor Portable (free) https://portableapps.com/apps/office/pdf-xchange-editor-portable

    Convert -> OCR Page(s) -> All

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)