I use Ubuntu 18.04, and I want to find an OCR programme to use with my scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures,
none of which gave me a working OCR program. Further research showed
that there are several known problems with this, and no two people
seemed to have the same problem or fix. Eventually I gave up, having
wasted hours getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use with my
scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures, none
of which gave me a working OCR program. Further research showed that
there are several known problems with this, and no two people seemed to
have the same problem or fix. Eventually I gave up, having wasted hours
getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
Have you looked at Vuescan? Its not FOSS, but it should work with
most SANE-compatible scanners and says it does OCR.
I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
and it did what I wanted it to do. The program was straight forward
to use.
Written and supported by https://www.hamrick.com/
On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use with my
scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures, none
of which gave me a working OCR program. Further research showed that
there are several known problems with this, and no two people seemed to
have the same problem or fix. Eventually I gave up, having wasted hours
getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use with my
scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures,
none of which gave me a working OCR program. Further research showed
that there are several known problems with this, and no two people
seemed to have the same problem or fix. Eventually I gave up, having
wasted hours getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use with my
scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures,
none of which gave me a working OCR program. Further research showed
that there are several known problems with this, and no two people
seemed to have the same problem or fix. Eventually I gave up, having
wasted hours getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
On Wed, 18 Aug 2021 11:29:00 -0000 (UTC)
Martin Gregorie <martin@mydomain.invalid> wrote:
On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
Have you looked at Vuescan? Its not FOSS, but it should work with
most SANE-compatible scanners and says it does OCR.
I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
and it did what I wanted it to do. The program was straight forward
to use.
Written and supported by https://www.hamrick.com/
No, but I'll take a look. Thanks.
I'll report on it later.
On Wed, 18 Aug 2021 09:41:08 +0100, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100 Davey <davey@example.invalid>
wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
Have you looked at Vuescan? Its not FOSS, but it should work with
most SANE-compatible scanners and says it does OCR.
I've used it for slide scanning (Fedora Linux, Minolta Scan Dual IV)
and it did what I wanted it to do. The program was straight forward
to use.
Written and supported by https://www.hamrick.com/
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use with my
scanner. Research showed me that Tesseract was the most usual choice,
so I found and followed several different installation procedures,
none of which gave me a working OCR program. Further research showed
that there are several known problems with this, and no two people
seemed to have the same problem or fix. Eventually I gave up, having
wasted hours getting nowhere.
So I am looking for either a definitive way to get this working, or
more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
On 18/08/2021 08:41, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
OCRmypdf works and comes with a working Dockerhub image. It is based
on Tesseract.
<https://hub.docker.com/r/jbarlow83/ocrmypdf/>
I've been meaning into set it up as a service on a shared folder,
where my scanner depositing scans. The OCR worked ok, but I had a bit
of trouble determining an event handler for when the scanner PDF file
writing was actually completed. I had a few problems a few months
ago, and gave up, but some of them seem to have resolved themselves,
so I'll give it another go.
On Sun, 20 Feb 2022 09:17:28 +0000
Pancho <Pancho.Dontmaileme@outlook.com> wrote:
On 18/08/2021 08:41, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
OCRmypdf works and comes with a working Dockerhub image. It is based
on Tesseract.
<https://hub.docker.com/r/jbarlow83/ocrmypdf/>
I've been meaning into set it up as a service on a shared folder,
where my scanner depositing scans. The OCR worked ok, but I had a bit
of trouble determining an event handler for when the scanner PDF file
writing was actually completed. I had a few problems a few months
ago, and gave up, but some of them seem to have resolved themselves,
so I'll give it another go.
I'll take a look, bu Tesseract was the sticking point when I tried to
get OCR working. Thanks.
On 20/02/2022 10:01, Davey wrote:
On Sun, 20 Feb 2022 09:17:28 +0000
Pancho <Pancho.Dontmaileme@outlook.com> wrote:
On 18/08/2021 08:41, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours getting
nowhere. So I am looking for either a definitive way to get this
working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
OCRmypdf works and comes with a working Dockerhub image. It is based
on Tesseract.
<https://hub.docker.com/r/jbarlow83/ocrmypdf/>
I've been meaning into set it up as a service on a shared folder,
where my scanner depositing scans. The OCR worked ok, but I had a bit
of trouble determining an event handler for when the scanner PDF file
writing was actually completed. I had a few problems a few months
ago, and gave up, but some of them seem to have resolved themselves,
so I'll give it another go.
I'll take a look, bu Tesseract was the sticking point when I tried to
get OCR working. Thanks.
OK. I just looked. I'm actually building my own Docker Image (presumably because I wanted to use it on a rPi). It's very simple Dockerfile
---
FROM ubuntu:latest
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
python3-pip \
ocrmypdf
RUN pip3 install --no-cache-dir \
watchdog==0.10.2
COPY watcher.py .
ENTRYPOINT ["/usr/bin/ocrmypdf"]
---
I can't speak to your set up, but "apt install ocrmypdf" includes
Tesseract. The Dockerfile is basically doing the ocrmypdf install on
virgin Ubuntu:20.04. It works, I just tested it.
On 20/02/2022 20:43, Pancho wrote:
On 20/02/2022 10:01, Davey wrote:
On Sun, 20 Feb 2022 09:17:28 +0000
Pancho <Pancho.Dontmaileme@outlook.com> wrote:
On 18/08/2021 08:41, Davey wrote:
On Sun, 15 Aug 2021 10:18:26 +0100
Davey <davey@example.invalid> wrote:
I use Ubuntu 18.04, and I want to find an OCR programme to use
with my scanner. Research showed me that Tesseract was the most
usual choice, so I found and followed several different
installation procedures, none of which gave me a working OCR
program. Further research showed that there are several known
problems with this, and no two people seemed to have the same
problem or fix. Eventually I gave up, having wasted hours
getting nowhere. So I am looking for either a definitive way to
get this working, or more likely, an alternative OCR program.
The destination will usually be LibreOffice Write.
Advice or helpful suggestions welcome, please.
I can only assume that nobody uses OCR on Ubuntu. Oh well.
OCRmypdf works and comes with a working Dockerhub image. It is
based on Tesseract.
<https://hub.docker.com/r/jbarlow83/ocrmypdf/>
I've been meaning into set it up as a service on a shared folder,
where my scanner depositing scans. The OCR worked ok, but I had a
bit of trouble determining an event handler for when the scanner
PDF file writing was actually completed. I had a few problems a
few months ago, and gave up, but some of them seem to have
resolved themselves, so I'll give it another go.
I'll take a look, bu Tesseract was the sticking point when I tried
to get OCR working. Thanks.
OK. I just looked. I'm actually building my own Docker Image
(presumably because I wanted to use it on a rPi). It's very simple Dockerfile ---
FROM ubuntu:latest
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
python3-pip \
ocrmypdf
RUN pip3 install --no-cache-dir \
watchdog==0.10.2
COPY watcher.py .
ENTRYPOINT ["/usr/bin/ocrmypdf"]
---
I can't speak to your set up, but "apt install ocrmypdf" includes Tesseract. The Dockerfile is basically doing the ocrmypdf install
on virgin Ubuntu:20.04. It works, I just tested it.
Now working on a Raspberry Pi.
I can store a PDF scan direct to a network share from the Scanner
console, no PC needed.
I have OcrMyPDF running in a Docker container on the rPi, watching
the folder. This automatically adds OCR text to the PDF and moves it
to an appropriate folder.
Now all I need to do is write something to give the scans appropriate filenames, inferred from the OCR text.
Little things please little minds :-)
You are working at the extreme edge of my programming knowledge, so if
I try this, I might get lost!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 171 |
Nodes: | 16 (1 / 15) |
Uptime: | 16:30:36 |
Calls: | 3,417 |
Files: | 10,843 |
Messages: | 3,220,810 |