• Tesseract alternative

    From Davey@21:1/5 to All on Sun Aug 15 10:18:26 2021
    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures, none
    of which gave me a working OCR program. Further research showed that
    there are several known problems with this, and no two people seemed to
    have the same problem or fix. Eventually I gave up, having wasted hours
    getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Davey on Wed Aug 18 09:41:08 2021
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures,
    none of which gave me a working OCR program. Further research showed
    that there are several known problems with this, and no two people
    seemed to have the same problem or fix. Eventually I gave up, having
    wasted hours getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Davey on Wed Aug 18 11:29:00 2021
    On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures, none
    of which gave me a working OCR program. Further research showed that
    there are several known problems with this, and no two people seemed to
    have the same problem or fix. Eventually I gave up, having wasted hours
    getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    Have you looked at Vuescan? Its not FOSS, but it should work with most SANE-compatible scanners and says it does OCR.

    I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV) and
    it did what I wanted it to do. The program was straight forward to use.

    Written and supported by https://www.hamrick.com/


    --
    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Martin Gregorie on Wed Aug 18 12:58:28 2021
    On Wed, 18 Aug 2021 11:29:00 -0000 (UTC)
    Martin Gregorie <martin@mydomain.invalid> wrote:

    On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
    wrote:
    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    Have you looked at Vuescan? Its not FOSS, but it should work with
    most SANE-compatible scanners and says it does OCR.

    I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
    and it did what I wanted it to do. The program was straight forward
    to use.

    Written and supported by https://www.hamrick.com/



    No, but I'll take a look. Thanks.
    I'll report on it later.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Davey on Wed Aug 18 12:29:00 2021
    On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures, none
    of which gave me a working OCR program. Further research showed that
    there are several known problems with this, and no two people seemed to
    have the same problem or fix. Eventually I gave up, having wasted hours
    getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    Have you looked at Vuescan? Its not FOSS, but it should work with most SANE-compatible scanners and says it does OCR.

    I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV) and
    it did what I wanted it to do. The program was straight forward to use.

    Written and supported by https://www.hamrick.com/


    --
    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From AnthonyL@21:1/5 to Davey@f1.n221.z2.fidonet.fi on Thu Aug 19 10:42:48 2021
    On Wed, 18 Aug 2021 09:41:08 +0200, Davey
    <Davey@f1.n221.z2.fidonet.fi> wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures,
    none of which gave me a working OCR program. Further research showed
    that there are several known problems with this, and no two people
    seemed to have the same problem or fix. Eventually I gave up, having
    wasted hours getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.


    I have but not for a year or so and that was Tesseract and a bit
    underwhelming. I think I was using the same (k)Ubuntu as you.

    The last decent free OCR program I had came with a Canon all-in-one
    printer on Windows. I think the more recent ones I've done have been
    via Google Drive and convert from one format eg PDF to another eg
    Google Docs.


    --
    AnthonyL

    Why ever wait to finish a job before starting the next?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From AnthonyL@21:1/5 to Davey@f1.n221.z2.fidonet.fi on Thu Aug 19 11:42:48 2021
    On Wed, 18 Aug 2021 09:41:08 +0200, Davey
    <Davey@f1.n221.z2.fidonet.fi> wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures,
    none of which gave me a working OCR program. Further research showed
    that there are several known problems with this, and no two people
    seemed to have the same problem or fix. Eventually I gave up, having
    wasted hours getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.


    I have but not for a year or so and that was Tesseract and a bit
    underwhelming. I think I was using the same (k)Ubuntu as you.

    The last decent free OCR program I had came with a Canon all-in-one
    printer on Windows. I think the more recent ones I've done have been
    via Google Drive and convert from one format eg PDF to another eg
    Google Docs.


    --
    AnthonyL

    Why ever wait to finish a job before starting the next?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Davey on Sun Aug 22 12:30:43 2021
    On Wed, 18 Aug 2021 12:58:28 +0100
    Davey <davey@example.invalid> wrote:

    On Wed, 18 Aug 2021 11:29:00 -0000 (UTC)
    Martin Gregorie <martin@mydomain.invalid> wrote:

    On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
    wrote:
    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    Have you looked at Vuescan? Its not FOSS, but it should work with
    most SANE-compatible scanners and says it does OCR.

    I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
    and it did what I wanted it to do. The program was straight forward
    to use.

    Written and supported by https://www.hamrick.com/



    No, but I'll take a look. Thanks.
    I'll report on it later.

    I installed Vuescan, but it doesn't recognise my Artek scanner, so I
    informed them. meanwhile, I'll see if I can access the OCR engine
    with a scanned image.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Martin Gregorie on Sun Aug 22 23:17:52 2021
    On Wed, 18 Aug 2021 11:29:00 -0000 (UTC)
    Martin Gregorie <martin@mydomain.invalid> wrote:

    On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:

    On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
    wrote:
    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    Have you looked at Vuescan? Its not FOSS, but it should work with
    most SANE-compatible scanners and says it does OCR.

    I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
    and it did what I wanted it to do. The program was straight forward
    to use.

    Written and supported by https://www.hamrick.com/



    I got the OCR working, but its accuracy is not worth paying the sum
    required for a license, even trying various different ways of doing it.
    But thanks for the suggestion.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pancho@21:1/5 to Davey on Sun Feb 20 09:17:28 2022
    On 18/08/2021 08:41, Davey wrote:
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use with my
    scanner. Research showed me that Tesseract was the most usual choice,
    so I found and followed several different installation procedures,
    none of which gave me a working OCR program. Further research showed
    that there are several known problems with this, and no two people
    seemed to have the same problem or fix. Eventually I gave up, having
    wasted hours getting nowhere.
    So I am looking for either a definitive way to get this working, or
    more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.


    OCRmypdf works and comes with a working Dockerhub image. It is based on Tesseract.

    <https://hub.docker.com/r/jbarlow83/ocrmypdf/>

    I've been meaning into set it up as a service on a shared folder, where
    my scanner depositing scans. The OCR worked ok, but I had a bit of
    trouble determining an event handler for when the scanner PDF file
    writing was actually completed. I had a few problems a few months ago,
    and gave up, but some of them seem to have resolved themselves, so I'll
    give it another go.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Pancho on Sun Feb 20 10:01:30 2022
    On Sun, 20 Feb 2022 09:17:28 +0000
    Pancho <Pancho.Dontmaileme@outlook.com> wrote:

    On 18/08/2021 08:41, Davey wrote:
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.


    OCRmypdf works and comes with a working Dockerhub image. It is based
    on Tesseract.

    <https://hub.docker.com/r/jbarlow83/ocrmypdf/>

    I've been meaning into set it up as a service on a shared folder,
    where my scanner depositing scans. The OCR worked ok, but I had a bit
    of trouble determining an event handler for when the scanner PDF file
    writing was actually completed. I had a few problems a few months
    ago, and gave up, but some of them seem to have resolved themselves,
    so I'll give it another go.


    I'll take a look, bu Tesseract was the sticking point when I tried to
    get OCR working. Thanks.
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pancho@21:1/5 to Davey on Sun Feb 20 20:43:31 2022
    On 20/02/2022 10:01, Davey wrote:
    On Sun, 20 Feb 2022 09:17:28 +0000
    Pancho <Pancho.Dontmaileme@outlook.com> wrote:

    On 18/08/2021 08:41, Davey wrote:
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:

    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.


    OCRmypdf works and comes with a working Dockerhub image. It is based
    on Tesseract.

    <https://hub.docker.com/r/jbarlow83/ocrmypdf/>

    I've been meaning into set it up as a service on a shared folder,
    where my scanner depositing scans. The OCR worked ok, but I had a bit
    of trouble determining an event handler for when the scanner PDF file
    writing was actually completed. I had a few problems a few months
    ago, and gave up, but some of them seem to have resolved themselves,
    so I'll give it another go.


    I'll take a look, bu Tesseract was the sticking point when I tried to
    get OCR working. Thanks.


    OK. I just looked. I'm actually building my own Docker Image (presumably because I wanted to use it on a rPi). It's very simple Dockerfile
    ---
    FROM ubuntu:latest


    RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 \
    python3-pip \
    ocrmypdf

    RUN pip3 install --no-cache-dir \
    watchdog==0.10.2

    COPY watcher.py .

    ENTRYPOINT ["/usr/bin/ocrmypdf"]
    ---

    I can't speak to your set up, but "apt install ocrmypdf" includes
    Tesseract. The Dockerfile is basically doing the ocrmypdf install on
    virgin Ubuntu:20.04. It works, I just tested it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pancho@21:1/5 to Pancho on Mon Feb 21 14:30:45 2022
    On 20/02/2022 20:43, Pancho wrote:
    On 20/02/2022 10:01, Davey wrote:
    On Sun, 20 Feb 2022 09:17:28 +0000
    Pancho <Pancho.Dontmaileme@outlook.com> wrote:

    On 18/08/2021 08:41, Davey wrote:
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:
    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours getting
    nowhere. So I am looking for either a definitive way to get this
    working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    OCRmypdf works and comes with a working Dockerhub image. It is based
    on Tesseract.

    <https://hub.docker.com/r/jbarlow83/ocrmypdf/>

    I've been meaning into set it up as a service on a shared folder,
    where my scanner depositing scans. The OCR worked ok, but I had a bit
    of trouble determining an event handler for when the scanner PDF file
    writing was actually completed. I had a few problems a few months
    ago, and gave up, but some of them seem to have resolved themselves,
    so I'll give it another go.


    I'll take a look, bu Tesseract was the sticking point when I tried to
    get OCR working. Thanks.


    OK. I just looked. I'm actually building my own Docker Image (presumably because I wanted to use it on a rPi). It's very simple Dockerfile
    ---
    FROM ubuntu:latest


    RUN apt-get update && apt-get install -y --no-install-recommends \
      python3 \
      python3-pip \
      ocrmypdf

    RUN pip3 install --no-cache-dir \
    watchdog==0.10.2

    COPY watcher.py .

    ENTRYPOINT ["/usr/bin/ocrmypdf"]
    ---

    I can't speak to your set up, but "apt install ocrmypdf" includes
    Tesseract. The Dockerfile is basically doing the ocrmypdf install on
    virgin Ubuntu:20.04. It works, I just tested it.


    Now working on a Raspberry Pi.

    I can store a PDF scan direct to a network share from the Scanner
    console, no PC needed.

    I have OcrMyPDF running in a Docker container on the rPi, watching the
    folder. This automatically adds OCR text to the PDF and moves it to an appropriate folder.

    Now all I need to do is write something to give the scans appropriate filenames, inferred from the OCR text.

    Little things please little minds :-)








    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davey@21:1/5 to Pancho on Mon Feb 21 20:01:21 2022
    On Mon, 21 Feb 2022 14:30:45 +0000
    Pancho <Pancho.Dontmaileme@outlook.com> wrote:

    On 20/02/2022 20:43, Pancho wrote:
    On 20/02/2022 10:01, Davey wrote:
    On Sun, 20 Feb 2022 09:17:28 +0000
    Pancho <Pancho.Dontmaileme@outlook.com> wrote:

    On 18/08/2021 08:41, Davey wrote:
    On Sun, 15 Aug 2021 10:18:26 +0100
    Davey <davey@example.invalid> wrote:
    I use Ubuntu 18.04, and I want to find an OCR programme to use
    with my scanner. Research showed me that Tesseract was the most
    usual choice, so I found and followed several different
    installation procedures, none of which gave me a working OCR
    program. Further research showed that there are several known
    problems with this, and no two people seemed to have the same
    problem or fix. Eventually I gave up, having wasted hours
    getting nowhere. So I am looking for either a definitive way to
    get this working, or more likely, an alternative OCR program.
    The destination will usually be LibreOffice Write.

    Advice or helpful suggestions welcome, please.

    I can only assume that nobody uses OCR on Ubuntu. Oh well.

    OCRmypdf works and comes with a working Dockerhub image. It is
    based on Tesseract.

    <https://hub.docker.com/r/jbarlow83/ocrmypdf/>

    I've been meaning into set it up as a service on a shared folder,
    where my scanner depositing scans. The OCR worked ok, but I had a
    bit of trouble determining an event handler for when the scanner
    PDF file writing was actually completed. I had a few problems a
    few months ago, and gave up, but some of them seem to have
    resolved themselves, so I'll give it another go.


    I'll take a look, bu Tesseract was the sticking point when I tried
    to get OCR working. Thanks.


    OK. I just looked. I'm actually building my own Docker Image
    (presumably because I wanted to use it on a rPi). It's very simple Dockerfile ---
    FROM ubuntu:latest


    RUN apt-get update && apt-get install -y --no-install-recommends \
      python3 \
      python3-pip \
      ocrmypdf

    RUN pip3 install --no-cache-dir \
    watchdog==0.10.2

    COPY watcher.py .

    ENTRYPOINT ["/usr/bin/ocrmypdf"]
    ---

    I can't speak to your set up, but "apt install ocrmypdf" includes Tesseract. The Dockerfile is basically doing the ocrmypdf install
    on virgin Ubuntu:20.04. It works, I just tested it.


    Now working on a Raspberry Pi.

    I can store a PDF scan direct to a network share from the Scanner
    console, no PC needed.

    I have OcrMyPDF running in a Docker container on the rPi, watching
    the folder. This automatically adds OCR text to the PDF and moves it
    to an appropriate folder.

    Now all I need to do is write something to give the scans appropriate filenames, inferred from the OCR text.

    Little things please little minds :-)









    You are working at the extreme edge of my programming knowledge, so if
    I try this, I might get lost!
    --
    Davey.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pancho@21:1/5 to Davey on Tue Feb 22 00:07:07 2022
    On 21/02/2022 20:01, Davey wrote:


    You are working at the extreme edge of my programming knowledge, so if
    I try this, I might get lost!

    I'm not sure which bit. Maybe Docker? It is easier to learn Docker than
    it is to learn how to diagnose installation problems in Linux, unless
    you want to be like Sisyphus.

    Your difficulties with Tesseract could be due to all sorts of things,
    specific to your installation of Ubuntu. It is very hard to make
    software packages work on every installation of Ubuntu, no matter what
    else is installed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)