• INN DB over IPFS

    From Jason Evans@21:1/5 to All on Fri Jan 7 14:49:32 2022
    I'm still throwing around this idea, but I thought I would share it.

    Right now with INN, one of the storage options is to save articles
    directly in /var/spool/news/articles. What if, in addition to saving
    articles there, the articles were also saved in IPFS so they could be
    retrieved by anyone who wanted a copy.

    There would be two problems with this:

    1. How to get your files into IPFS. I think this could be pretty simple actually. You could either brute force it with a cronjob that just uploads
    a diff of articles every X minutes or an addon could be written for INN
    that just does it automatically.

    2. How to retrieve articles from IPFS. The easy way to do this is by using
    an IPFS gateway and a browser. A dedicated retrieval app would be better,
    but it would need to be written.

    I think the greatest strength of this technology would be for keeping a
    record of the Usenet in perpetuity. Google isn't doing a good job anymore.
    They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
    key.

    This is as far as I've gotten in thinking about this. For those who don't
    know what the heck I'm talking about, here is some more info.

    https://ipfs.io/#how

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Jason Evans on Fri Jan 7 10:22:18 2022
    On 1/7/22 7:49 AM, Jason Evans wrote:
    Right now with INN, one of the storage options is to save articles
    directly in /var/spool/news/articles.

    It looks like you're talking about what I know as Trad(itional) Spool.

    My understanding is that the file names in newsgroup directories in
    TradSpool are inherently news server specific.

    What if, in addition to saving articles there, the articles were also
    saved in IPFS so they could be retrieved by anyone who wanted a copy.

    I suspect that you are going to end up with a metadata problem.

    My concern is that the files aren't enumerated to glean their
    (meta)data. Instead they tend to be read from an index of some sort.
    That separation means that there is an association between the to. So
    simply adding new files to the newsgroup directory likely won't /just/
    work. What's more is the (meta)data index tends to be kept loaded into
    memory (even if it's a file on disk) for performance optimization
    reasons. So you end up with inconsistency issues.

    Let's not talk about multiple people having write access.

    Aside: I suspect you could do something similar with read-only NFS (et
    al.) as you could with IPFS. (Above problems notwithstanding.)

    There would be two problems with this:

    1. How to get your files into IPFS. I think this could be pretty
    simple actually. You could either brute force it with a cronjob that
    just uploads a diff of articles every X minutes or an addon could be
    written for INN that just does it automatically.

    Configure a newsfeed to feed articles to IPFS. That would be the INN
    method. -- I see no /need/ to side step / hack around INN in this context.

    2. How to retrieve articles from IPFS. The easy way to do this is by
    using an IPFS gateway and a browser. A dedicated retrieval app would
    be better, but it would need to be written.

    This -- what I'll call -- complexity seems to be a negative to using IPFS.

    There is also the fact that this IPFS configuration is inherently
    dependent on the source news server. As such, it is heavily SPOFed
    thereon. Compare this to traditional news servers with NNTP(S)
    connections to multiple peers.

    I think the greatest strength of this technology would be for keeping
    a record of the Usenet in perpetuity.

    Given the amount of ... let's say /questionable/ content that I see ...
    I'm not confident that this is a good idea.

    Google isn't doing a good job anymore. They censor message headers
    and you also can't download articles anymore outside of copy/paste
    from your browser. Independent archivists are the key.

    Google wasn't, isn't, and won't be an archive, ever, period, end of
    story. Google is a search engine and an advertising company.

    IMHO complaining about Google not being a good archive is about like complaining that a sive doesn't hold water. It's not supposed to hold
    water.

    This is as far as I've gotten in thinking about this. For those who
    don't know what the heck I'm talking about, here is some more info.

    I think the very high level concept has some merit. But I think the
    idea (brute force copying articles form TradSpool) and the method (IPFS)
    have some limitations.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jason Evans on Fri Jan 7 10:10:01 2022
    Jason Evans <jsevans@mailfence.com> writes:

    Right now with INN, one of the storage options is to save articles
    directly in /var/spool/news/articles. What if, in addition to saving
    articles there, the articles were also saved in IPFS so they could be retrieved by anyone who wanted a copy.

    I'm not sure I understand what benefit IPFS offers for Usenet.

    Looking over their front page, my impression of the problems that IPFS is trying to solve is:

    * Peer-to-peer distributed hash-addressable content archiving.
    * Data versioning and unique identifiers via the hash.
    * Tamper-resistence (again because everything is tied to a hash).

    Usenet does not have the unique identifier problem; that's the message ID, which already has to be unique for Usenet to work. Usenet doesn't have a
    data versioning problem; you're not supposed to ever modify an article
    once it has been posted. And everything on Usenet retrieves messages by message ID or by newsgroup name, so the hash-addressable content storage
    is kind of useless since no one has or cares about those hashes.
    Meanwhile, you have a potential uniqueness problem for IPFS storage
    because the Path, Xref, and some other headers vary by Usenet node, so you
    need to apply some filter to the article before storing it in IPFS anyway
    or you just break IPFS by storing multiple duplicates of every article.

    The peer-to-peer thing is arguably interesting, except that this is
    essentially what NNTP already does, with roughly equivalent efficiency, so mostly what you're gaining here is adding nodes that are IPFS nodes but
    not Usenet nodes to the archive network. Which, okay, sure, and
    admittedly setting up IPFS is probably a lot easier than setting up a news server, but then how do you *use* those nodes to retrieve articles in any useful way?

    If we were designing Usenet today from scratch, there's a decent argument
    to be made that message IDs should just be message hashes over a
    canonicalized version of the message, which (mostly) avoids the problem of
    how to generate good message IDs and also allows verification that the
    message hasn't been modified (against which Usenet currently has no real protection except speed of propagation). And indeed there's nothing that
    would stop Usenet software from starting to use hashes as message IDs now, although because it's not guaranteed, it would need a lot of adoption
    before you would get any reasonable integrity protection benefits. But
    the rest of IPFS feels like a generic implementation of functionality that Usenet already has a more specialized implementation of, but with the
    wrong storage keys so you would have to store external metadata to have
    any hope of being able to do meaningful message retrieval.

    I agree that it would be great to have more independent Usenet archives,
    but simplicity says the easiest way to do that would be to find some
    volunteers to run ordinary Usenet servers with infinite retention.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Doc O'Leary@21:1/5 to Jason Evans on Fri Jan 7 18:24:14 2022
    For your reference, records indicate that
    Jason Evans <jsevans@mailfence.com> wrote:

    I'm still throwing around this idea, but I thought I would share it.

    Right now with INN, one of the storage options is to save articles
    directly in /var/spool/news/articles. What if, in addition to saving
    articles there, the articles were also saved in IPFS so they could be retrieved by anyone who wanted a copy.

    The same could be asked about *any* type of protocol shifting/bridging.
    What if you used HTTP instead of NNTP? What about IMAP? Maybe AMQP?
    Lets not even get started with all the decentralized protocols… The fundamental question is always the tradeoffs involved in what is still
    just moving files through a network.

    There would be two problems with this: [1. In, 2. Out]

    Honestly, I’d say the technology is a non-starter if it can’t be
    abstractly used. That is to say, I’d like for it to function as an
    overlay for the spool directory (using FUSE or some similar OS-level
    solution) that would manage the on-demand changes in nearly real time.
    It should not be Usenet-specific, other than making it seem like
    everyone in the world using IPFS is essentially a local Usenet user.

    I think the greatest strength of this technology would be for keeping a record of the Usenet in perpetuity. Google isn't doing a good job anymore. They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
    key.

    Well, there’s no reason to think IPFS will have any greater longevity than Google or a big commercial Usenet provider. Or the Internet Archive.
    Indeed, I’d say a greater value than a protocol switch going forward would
    be to curate the existing Usenet content at archive.org such that it can
    be made more easily usable.

    --
    "Also . . . I can kill you with my brain."
    River Tam, Trash, Firefly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Doc O'Leary on Fri Jan 7 13:16:17 2022
    On 1/7/22 11:24 AM, Doc O'Leary wrote:
    Indeed, I’d say a greater value than a protocol switch going forward
    would be to curate the existing Usenet content at archive.org such
    that it can be made more easily usable.

    I wonder if it would be possible to create a batched feed that exports
    articles into an archive of some sort that could be periodically
    uploaded to the Internet Archive.

    I see value in something like that. Though we would only need one or
    two entities doing that, lest we end up with significant duplication on
    the Internet Archive.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From meff@21:1/5 to Jason Evans on Sat Jan 8 06:22:08 2022
    On 2022-01-07, Jason Evans <jsevans@mailfence.com> wrote:
    I think the greatest strength of this technology would be for keeping a record of the Usenet in perpetuity. Google isn't doing a good job anymore. They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
    key.

    The problem I see is that whomever stores this article in IPFS would
    still need to pin it. If other folks are interested, they could host
    certain articles, but given the nature of messages on Usenet, I'm not
    sure a critical mass of people would be interested in pinning an
    article to Usenet. Like others in this thread have opined, I think it
    makes more sense to either donate messages to the Internet Archive and
    see if they would be willing to hold onto them. That, or make easy
    archiving software for others to host archives.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jason Evans@21:1/5 to Grant Taylor on Sat Jan 8 17:18:36 2022
    On Fri, 7 Jan 2022 13:16:17 -0700, Grant Taylor wrote:

    I see value in something like that. Though we would only need one or
    two entities doing that, lest we end up with significant duplication on
    the Internet Archive.

    I like this idea more than the IPFS idea. I'm in the process of trying to
    get in touch with them to see how we can do this.

    Jason

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Jason Evans on Sat Jan 8 12:42:31 2022
    On 1/8/22 10:18 AM, Jason Evans wrote:
    I like this idea more than the IPFS idea. I'm in the process of trying
    to get in touch with them to see how we can do this.

    Cool.

    Please share your findings.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)