Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

Re: pdf grep?

From Robert Heller@21:1/5 to dieterhansbritz@gmail.com on Wed Apr 3 14:03:37 2024

Grep may sort of also work with pdf files. You might want to also use the strings command to get "clean" srings. Note: *some* pdf files are just images (no actual text). These would be PDFs created by scanning a document (not
using OCR). Also, many typesetting programs (TeX/LaTex, word-processos, etc), might do some typesetting "magic" (eg ligitures, etc.) that might make things hard for grep.

xpdf includes a text search button as part of its UI.

At Wed, 3 Apr 2024 12:45:20 -0000 (UTC) db <dieterhansbritz@gmail.com> wrote:

Under Linux, I can use grep to search a bunch of
files for a character string. Is there an equivalent
command for searching pdf files?

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
heller@deepsoft.com -- Webhosting Services

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Robert Heller on Wed Apr 3 14:17:22 2024

Robert Heller <heller@deepsoft.com> wrote or quoted:

might do some typesetting "magic" (eg ligitures, etc.) that might make things

"ligatures"

Text in PDFs is sometimes compressed. So one can either use
programs like "Agent Ransack" to search for text in PDFs or
tools like "pdftotext" to first create a text file for every
PDF file and then grep those text files.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Stefan Ram on Wed Apr 3 14:29:40 2024

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:

Text in PDFs is sometimes compressed. So one can either use
programs like "Agent Ransack" to search for text in PDFs or
tools like "pdftotext" to first create a text file for every
PDF file and then grep those text files.

PS: "Agent Ransack" is Windows software. "pdftotext" is also
available for Linux. Converting all PDFs to text files needs
to be done only once, and then search operations on those
text files are faster than scanning the PDF files for text
on every search!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Landscheidt@21:1/5 to dieterhansbritz@gmail.com on Wed Apr 3 14:22:18 2024

db <dieterhansbritz@gmail.com> wrote:

Under Linux, I can use grep to search a bunch of
files for a character string. Is there an equivalent
command for searching pdf files?

You can use pdfgrep (https://pdfgrep.org/) for that. It is
available as a package in Fedora and Debian as well.

Tim

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Flynn@21:1/5 to All on Thu Apr 4 16:57:49 2024

On 04/04/2024 10:50, db wrote:
[...]

I installed pdfgrep in my Kubuntu system, but it is
not happy. Although the man file is there, even help
doesn't work:

I just installed pdfgrep_2.1.2-1build1_amd64.deb in my Mint 20.1 and it
seems to work OK. What version is the Kubuntu one?

Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Sat Sep 14 21:37:46 2024
  from Wales, Uk via Telnet
- Zharous
  Sat Sep 14 21:22:34 2024
  from Tempe, Az via Telnet
- Keyop
  Sat Sep 14 20:00:50 2024
  from Huddersfield, West Yorkshire via SSH
- Ratio
  Sat Sep 14 19:07:25 2024
  from Your, Mom, Womb via Telnet
- Tom21200
  Sat Sep 14 19:06:47 2024
  from France via Telnet
- Tom21200
  Sat Sep 14 18:58:27 2024
  from France via Telnet
- Tom21200
  Sat Sep 14 18:40:42 2024
  from France via Telnet
- Pussydestroyer3945
  Sat Sep 14 18:39:48 2024
  from Your, Mom via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	344
Nodes:	16 (2 / 14)
Uptime:	33:12:46
Calls:	7,521
Calls today:	18
Files:	12,713
Messages:	5,642,722
Posted today:	2