• De-duplicating a Maildir directory

    From dpb@brannerchinese.com@21:1/5 to All on Fri Dec 17 01:29:43 2021
    Does Alpine contain functionality for de-duplicating a Maildir directory?

    It sometimes happens that a single message gets saved more than once to an archiving directory, and I'd like to know if there is already functionality for removing such duplicates.

    Thanks!

    - dpb

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dpb@brannerchinese.com@21:1/5 to All on Fri Dec 17 01:37:14 2021
    I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate

    But I'm wondering if there is anything comparable built into Alpine itself.

    - dpb

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to d...@brannerchinese.com on Fri Dec 17 13:58:06 2021
    On 17/12/2021 10.37, d...@brannerchinese.com wrote:
    I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate

    But I'm wondering if there is anything comparable built into Alpine itself.

    I think de-duplication is a file system feature, zfs has a such
    functionality where it will just store one block with the same data and
    then just point to that block. When you delete the last file pointing to
    that block, then the block content is deleted too.

    No, I Alpine don't have a function for deleting duplicate mails, you
    should look at tools made for this, for example https://github.com/kdeldycke/mail-deduplicate

    --
    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dpb@brannerchinese.com@21:1/5 to All on Sat Dec 18 03:40:52 2021
    I find mail-deduplicate inadequately documented, and some of the functionality doesn't work as expected. Output, for instance, seems always to be to mbox format, even when I specify Maildir input.

    However, I find fdupes (available through many package managers) helpful.

    - dpb

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eduardo Chappa@21:1/5 to d...@brannerchinese.com on Sat Dec 18 10:41:54 2021
    On Fri, 17 Dec 2021, d...@brannerchinese.com wrote:

    Does Alpine contain functionality for de-duplicating a Maildir directory?

    It sometimes happens that a single message gets saved more than once to
    an archiving directory, and I'd like to know if there is already functionality for removing such duplicates.

    Dear dpb,

    if you build alpine with maildir support, then the mailutil program
    bundled with Alpine will be able to read a maildir folder and remove duplicates. What you would do is to use the mailutil program as

    mailutil dedup MAILBOX_NAME

    if you do not input the MAILBOX, mailutil will remove duplicates of your
    INBOX. For purposes of defining a duplicate, this is understood as two
    messages that have the same message-id.

    I hope this helps.

    --
    Eduardo
    https://tinyurl.com/yc377wlh (web)
    http://repo.or.cz/alpine.git (Git)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to J.O. Aho on Tue Dec 21 13:09:36 2021
    On 17/12/2021 13.58, J.O. Aho wrote:
    On 17/12/2021 10.37, d...@brannerchinese.com wrote:
    I'm aware of this free-standing application:
    https://github.com/kdeldycke/mail-deduplicate

    But I'm wondering if there is anything comparable built into Alpine
    itself.

    I think de-duplication is a file system feature, zfs has a such
    functionality where it will just store one block with the same data and
    then just point to that block. When you delete the last file pointing to
    that block, then the block content is deleted too.

    No, I Alpine don't have a function for deleting duplicate mails, you
    should look at tools made for this, for example https://github.com/kdeldycke/mail-deduplicate

    Thunderbird has an addon to do this. It searches a folder, and produces
    a window listing duplicates (it displays several fields), offering to
    delete them. I find it a useful function.

    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henning Hucke@21:1/5 to Carlos E.R. on Thu Dec 23 08:07:11 2021
    On 2021-12-21, Carlos E.R. <robin_listas@es.invalid> wrote:

    [...]

    Thunderbird has an addon to do this. It searches a folder, and produces
    a window listing duplicates (it displays several fields), offering to
    delete them. I find it a useful function.

    Strange thing whis is! I never had (real) duplicates except intentional ones. The last part of the centence means that indeed it happenes that I save
    one mail to another folder without deleting the "original".
    Aside from this duplicates show up from sources which obvioulsy don't understand the task of a message ID and the necessity to avoid duplicates or which don't know how to generate unique identifiers.

    Atlassian and Jira are an bad example of that...

    Nonetheless they are no real duplicates in the sense that they are
    identical in message ID as well as mail body.

    Best regards,
    Henning
    --
    In the first place, God made idiots;
    this was for practice; then he made school boards.
    -- Mark Twain

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to Henning Hucke on Thu Dec 23 13:32:36 2021
    On 23/12/2021 09.07, Henning Hucke wrote:
    On 2021-12-21, Carlos E.R. <robin_listas@es.invalid> wrote:

    [...]

    Thunderbird has an addon to do this. It searches a folder, and
    produces a window listing duplicates (it displays several fields),
    offering to delete them. I find it a useful function.

    Strange thing whis is! I never had (real) duplicates except intentional
    ones.
    The last part of the centence means that indeed it happenes that I save
    one mail to another folder without deleting the "original".
    Aside from this duplicates show up from sources which obvioulsy don't understand the task of a message ID and the necessity to avoid
    duplicates or
    which don't know how to generate unique identifiers.

    Atlassian and Jira are an bad example of that...

    Nonetheless they are no real duplicates in the sense that they are
    identical in message ID as well as mail body.

    They happen easily when having two or more computers with local folders,
    when trying to keep things in sync between them.

    Say, on computer A you save mails about SciFi to folder SciFi, and later
    you do the same on computer B, but at that time there is a different
    selection for whatever reason, and later you try to sync the two SciFi
    folders.

    Or you move some mails to a temporary folder, then a year later you find
    that temporary and forgotten folder, and being afraid of deleting mails
    you move them to a final folder, not remembering they are already there.

    Things like that.

    True duplicates.

    So, a go at finding duplicates finds them and you can remove them
    relatively easily.


    Judging a dupe just by the messageid is a mistake. For instance, the
    sent folder and the inbox from a mail list would have your email in both
    places with the same messageid, but if you look carefully you see
    different headers, and sometimes different bodies.

    Gmail does exactly this mistake.

    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)