• Parsing an email message

    From Bernie Cosell@21:1/5 to All on Mon Jan 10 18:37:26 2022
    I need to parse an email message and pull its various parts apart. Is
    there some not-so-difficult way to do it? Corriel looks like it would be
    just the thing, unfortunately it won't run on Windows. The Mail:: and
    Email:: modules seem very complicated when all I want to do is feed it a complete message and get at the various pieces [body, attachments, etc] and
    the headers [from, date, etc]. Is there a _simple_ package that'll do
    that? If not, are there tutorials or the like for Mail:: and/or Email::?
    They seem to be much more focused on managing actual mailboxes {Mail::} and *composing* emails [Email::] and give pretty short shrift [to my struggling with the man pages] to just *parsing* an email. Thanks!

    /Bernie\
    --
    Bernie Cosell Fantasy Farm Fibers
    bernie@fantasyfarm.com Pearisburg, VA
    --> Too many people, too few sheep <--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to Bernie Cosell on Tue Jan 11 17:31:45 2022
    Bernie Cosell <bernie@fantasyfarm.com> writes:
    I need to parse an email message and pull its various parts apart. Is
    there some not-so-difficult way to do it? Corriel looks like it would be just the thing, unfortunately it won't run on Windows. The Mail:: and Email:: modules seem very complicated when all I want to do is feed it a complete message and get at the various pieces [body, attachments, etc] and the headers [from, date, etc]. Is there a _simple_ package that'll do
    that? If not, are there tutorials or the like for Mail:: and/or Email::? They seem to be much more focused on managing actual mailboxes {Mail::} and *composing* emails [Email::] and give pretty short shrift [to my struggling with the man pages] to just *parsing* an email. Thanks!

    There is no simple way to parse an e-mail message: That's literally the
    most complicated grammar I ever wrote a parser for.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henry Law@21:1/5 to Bernie Cosell on Tue Jan 11 16:58:29 2022
    On Mon, 10 Jan 2022 18:37:26 -0500, Bernie Cosell wrote:

    Is there a _simple_ package that'll do that? If not, are there
    tutorials or the like for Mail:: and/or Email::?

    I use Email::MIME. How "simple" it is depends on your point of view but,
    as someone else has already observed, MIME email has a complicated
    structure (e.g. separate parts within one message are themselves
    Email::MIME structures), and you're not going to get a /simple/ piece of
    code that understands that.

    However, if you pass the text of a single message to Email::MIME, the
    object will then give you a "header_pairs" method, which will give you a
    great deal of what you need. And there's a "body" method which will give
    you the body, surprisingly.

    If you want to send me a mail (address is valid) I can let you have great wodges of code that does this stuff; maybe reading through it and taking
    out the bits you don't need might help you. It's object-oriented so you
    might even be able to use the packages.

    --
    Henry Law n e w s @ l a w s h o u s e . o r g
    Manchester, England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Karrer@21:1/5 to All on Wed Jan 12 00:18:35 2022
    * Bernie Cosell <bernie@fantasyfarm.com>:
    I need to parse an email message and pull its various parts apart. Is
    there some not-so-difficult way to do it? Corriel looks like it would be

    There is no really simple way because mail headers and MIME are not
    simple. A MIME message may be an arbitrarily complex tree of parts,
    parts may be items of a whole lot of media types such as text, html,
    images, videos, pdf etc. Then there is the further complexity of "multipart/alternative", where you will have to decide by some
    heuristic which of the alternatives you want to extract or display.

    I'd recommend Email::MIME, maybe that qualifies as "not-so-difficult".

    "arbitrarily complex tree" is a hint that a recursive approach should
    be used.

    This skeleton passes the mail message in $message to Email::MIME for
    parsing. The "showparts" method then displays a summary of each direct
    subpart and calls itself recursively for that subpart. It uses Email::MIME::ContentType to parse the "Content-Type" headers, which may
    be quite complex, too.

    use Email::MIME;
    use Email::MIME::ContentType;

    my $email = Email::MIME->new($message);
    sub showparts;
    sub showparts {
    my $item = shift;
    my $indent = shift;
    my $i = 1;
    for my $part ($item->subparts) {
    my $ct = parse_content_type($part->content_type);
    my $len = length $part->body;
    print "part$indent $i: $ct->{type}/$ct->{subtype}, $len bytes\n";
    showparts $part, "$indent $i";
    $i++;
    }
    }
    showparts $email, "";

    If you are, for example, just interested in all pdf attachments,
    might be enough to filter out the parts with a Content-Type of
    application/pdf or application/x-pdf.



    - Andi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernie Cosell@21:1/5 to All on Wed Jan 19 13:39:52 2022
    Bernie Cosell <bernie@fantasyfarm.com> wrote:

    } I need to parse an email message and pull its various parts apart. Is
    } there some not-so-difficult way to do it?

    Wow -- thanks for all the info. I knew MIME messages were messy but I
    didn't really realize just *how* messy. I think I'll need to more
    fine-tune exactly what I want from the message and then focus on finding/extracting just that.

    Thanks! /Bernie\
    --
    Bernie Cosell Fantasy Farm Fibers
    bernie@fantasyfarm.com Pearisburg, VA
    --> Too many people, too few sheep <--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernie Cosell@21:1/5 to All on Wed Jan 26 12:45:15 2022
    Bernie Cosell <bernie@fantasyfarm.com> wrote:

    } Bernie Cosell <bernie@fantasyfarm.com> wrote:
    }
    } } I need to parse an email message and pull its various parts apart. Is
    } } there some not-so-difficult way to do it?
    }
    } I'm still struggling with this ...

    Please ignore. When I looked again I realized the idiot mistake I had
    made. DUH. It wants the *text* of the message, not a stupid file-name.
    When I did the open()... $msg=<..> it all magically worked. What a dolt I am... Sorry to bother y'all

    /Bernie\\
    --
    Bernie Cosell Fantasy Farm Fibers
    bernie@fantasyfarm.com Pearisburg, VA
    --> Too many people, too few sheep <--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernie Cosell@21:1/5 to All on Wed Jan 26 12:18:26 2022
    Bernie Cosell <bernie@fantasyfarm.com> wrote:

    } I need to parse an email message and pull its various parts apart. Is
    } there some not-so-difficult way to do it?

    I'm still struggling with this and I can't figure what I'm doing wrong I've been trying to start simple and ease my way into the morass [and thanks for
    all the sample code and advice... alas, I'm still kinda lost]. I tried a
    very very simple program: -------------------------------------------------------
    !/usr/bin/perl
    use v5.10 ;
    use strict;
    use warnings ;
    use Email::Simple ;
    use Email::MIME ;
    use Email::MIME::ContentType ;
    use Email::Simple::Header ;

    foreach my $msg (@ARGV)
    { checkmsg($msg) ; }
    exit ;

    sub checkmsg
    { my $email = Email::Simple->new($_[0]) ;
    my @header_names = $email->header_names ;
    say scalar(@header_names) ;
    foreach my $header (@header_names)
    { say "$header" ; }
    exit ;
    }
    ---------------------------------------------------------

    I tried it with a simple message [headers in part]
    ---------------------
    [...]
    Content-Type: multipart/alternative;
    boundary="Apple-Mail=_AB70B143-E35C-42EB-86E0-84730EB5E4A7" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
    Date: Sun, 9 Jan 2022 13:27:13 -0500
    Subject: Getting involved on state level
    Message-Id: <840F9FC5-346F-4D62-AF73-3CE38E1959E6@swva.net>
    X-Mailer: Apple Mail (2.3654.120.0.1.13)
    X-PMFLAGS: 570966400 0 65537 PT49NPRZ.CNM
    [...]

    I don't care about sorting out the MIME section, I just want to see if I
    can get the headers parsed.. but when I try it:

    D:\Desktop\>showparts Mailbox\multipart
    0

    What am I doing wrong? THANKS!! /bernie\
    --
    Bernie Cosell Fantasy Farm Fibers
    bernie@fantasyfarm.com Pearisburg, VA
    --> Too many people, too few sheep <--

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)