• extracting data from pdf

    From zeneca@21:1/5 to All on Thu Jan 6 15:55:52 2022
    Hello,
    I would like to extract date (account number, name, date ....) from a
    pdf file. Any idee how to do this??
    Many thanks in advances
    André

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe Beanfish@21:1/5 to zeneca on Fri Jan 7 17:37:39 2022
    On Thu, 06 Jan 2022 15:55:52 +0100, zeneca wrote:

    Hello,
    I would like to extract date (account number, name, date ....) from a
    pdf file. Any idee how to do this??
    Many thanks in advances
    André

    Since "account number" isn't a standard pdf meta data, I assume you
    want to extract meaningful data from the content of page(s) stored in
    the PDF? If you're lucky, it's not just a picture of a page and has
    actual text behind it. Try "pdftotext" or "pdftohtml" (part of "poppler")
    to extract whatever text there is. Then your favorite text processing language/utility for extracting the desired portions of the text.

    If you happen to want meta data from the pdf, try "pdfinfo", also a
    part of the "poppler" package.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)