Forum: >>> Magnum BBS <<<

INN DB over IPFS

From Jason Evans@21:1/5 to All on Fri Jan 7 14:49:32 2022

I'm still throwing around this idea, but I thought I would share it.

Right now with INN, one of the storage options is to save articles
directly in /var/spool/news/articles. What if, in addition to saving
articles there, the articles were also saved in IPFS so they could be
retrieved by anyone who wanted a copy.

There would be two problems with this:

1. How to get your files into IPFS. I think this could be pretty simple actually. You could either brute force it with a cronjob that just uploads
a diff of articles every X minutes or an addon could be written for INN
that just does it automatically.

2. How to retrieve articles from IPFS. The easy way to do this is by using
an IPFS gateway and a browser. A dedicated retrieval app would be better,
but it would need to be written.

I think the greatest strength of this technology would be for keeping a
record of the Usenet in perpetuity. Google isn't doing a good job anymore.
They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
key.

This is as far as I've gotten in thinking about this. For those who don't
know what the heck I'm talking about, here is some more info.

https://ipfs.io/#how

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Jason Evans on Fri Jan 7 10:22:18 2022

On 1/7/22 7:49 AM, Jason Evans wrote:

Right now with INN, one of the storage options is to save articles
directly in /var/spool/news/articles.

It looks like you're talking about what I know as Trad(itional) Spool.

My understanding is that the file names in newsgroup directories in
TradSpool are inherently news server specific.

What if, in addition to saving articles there, the articles were also
saved in IPFS so they could be retrieved by anyone who wanted a copy.

I suspect that you are going to end up with a metadata problem.

My concern is that the files aren't enumerated to glean their
(meta)data. Instead they tend to be read from an index of some sort.
That separation means that there is an association between the to. So
simply adding new files to the newsgroup directory likely won't /just/
work. What's more is the (meta)data index tends to be kept loaded into
memory (even if it's a file on disk) for performance optimization
reasons. So you end up with inconsistency issues.

Let's not talk about multiple people having write access.

Aside: I suspect you could do something similar with read-only NFS (et
al.) as you could with IPFS. (Above problems notwithstanding.)

There would be two problems with this:

1. How to get your files into IPFS. I think this could be pretty
simple actually. You could either brute force it with a cronjob that
just uploads a diff of articles every X minutes or an addon could be
written for INN that just does it automatically.

Configure a newsfeed to feed articles to IPFS. That would be the INN
method. -- I see no /need/ to side step / hack around INN in this context.

2. How to retrieve articles from IPFS. The easy way to do this is by
using an IPFS gateway and a browser. A dedicated retrieval app would
be better, but it would need to be written.

This -- what I'll call -- complexity seems to be a negative to using IPFS.

There is also the fact that this IPFS configuration is inherently
dependent on the source news server. As such, it is heavily SPOFed
thereon. Compare this to traditional news servers with NNTP(S)
connections to multiple peers.

I think the greatest strength of this technology would be for keeping
a record of the Usenet in perpetuity.

Given the amount of ... let's say /questionable/ content that I see ...
I'm not confident that this is a good idea.

Google isn't doing a good job anymore. They censor message headers
and you also can't download articles anymore outside of copy/paste
from your browser. Independent archivists are the key.

Google wasn't, isn't, and won't be an archive, ever, period, end of
story. Google is a search engine and an advertising company.

IMHO complaining about Google not being a good archive is about like complaining that a sive doesn't hold water. It's not supposed to hold
water.

This is as far as I've gotten in thinking about this. For those who
don't know what the heck I'm talking about, here is some more info.

I think the very high level concept has some merit. But I think the
idea (brute force copying articles form TradSpool) and the method (IPFS)
have some limitations.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Jason Evans on Fri Jan 7 10:10:01 2022

Jason Evans <jsevans@mailfence.com> writes:

Right now with INN, one of the storage options is to save articles
directly in /var/spool/news/articles. What if, in addition to saving
articles there, the articles were also saved in IPFS so they could be retrieved by anyone who wanted a copy.

I'm not sure I understand what benefit IPFS offers for Usenet.

Looking over their front page, my impression of the problems that IPFS is trying to solve is:

* Peer-to-peer distributed hash-addressable content archiving.
* Data versioning and unique identifiers via the hash.
* Tamper-resistence (again because everything is tied to a hash).

Usenet does not have the unique identifier problem; that's the message ID, which already has to be unique for Usenet to work. Usenet doesn't have a
data versioning problem; you're not supposed to ever modify an article
once it has been posted. And everything on Usenet retrieves messages by message ID or by newsgroup name, so the hash-addressable content storage
is kind of useless since no one has or cares about those hashes.
Meanwhile, you have a potential uniqueness problem for IPFS storage
because the Path, Xref, and some other headers vary by Usenet node, so you
need to apply some filter to the article before storing it in IPFS anyway
or you just break IPFS by storing multiple duplicates of every article.

The peer-to-peer thing is arguably interesting, except that this is
essentially what NNTP already does, with roughly equivalent efficiency, so mostly what you're gaining here is adding nodes that are IPFS nodes but
not Usenet nodes to the archive network. Which, okay, sure, and
admittedly setting up IPFS is probably a lot easier than setting up a news server, but then how do you *use* those nodes to retrieve articles in any useful way?

If we were designing Usenet today from scratch, there's a decent argument
to be made that message IDs should just be message hashes over a
canonicalized version of the message, which (mostly) avoids the problem of
how to generate good message IDs and also allows verification that the
message hasn't been modified (against which Usenet currently has no real protection except speed of propagation). And indeed there's nothing that
would stop Usenet software from starting to use hashes as message IDs now, although because it's not guaranteed, it would need a lot of adoption
before you would get any reasonable integrity protection benefits. But
the rest of IPFS feels like a generic implementation of functionality that Usenet already has a more specialized implementation of, but with the
wrong storage keys so you would have to store external metadata to have
any hope of being able to do meaningful message retrieval.

I agree that it would be great to have more independent Usenet archives,
but simplicity says the easiest way to do that would be to find some
volunteers to run ordinary Usenet servers with infinite retention.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Doc O'Leary@21:1/5 to Jason Evans on Fri Jan 7 18:24:14 2022

For your reference, records indicate that
Jason Evans <jsevans@mailfence.com> wrote:

I'm still throwing around this idea, but I thought I would share it.

Right now with INN, one of the storage options is to save articles
directly in /var/spool/news/articles. What if, in addition to saving
articles there, the articles were also saved in IPFS so they could be retrieved by anyone who wanted a copy.

The same could be asked about *any* type of protocol shifting/bridging.
What if you used HTTP instead of NNTP? What about IMAP? Maybe AMQP?
Lets not even get started with all the decentralized protocols… The fundamental question is always the tradeoffs involved in what is still
just moving files through a network.

There would be two problems with this: [1. In, 2. Out]

Honestly, I’d say the technology is a non-starter if it can’t be
abstractly used. That is to say, I’d like for it to function as an
overlay for the spool directory (using FUSE or some similar OS-level
solution) that would manage the on-demand changes in nearly real time.
It should not be Usenet-specific, other than making it seem like
everyone in the world using IPFS is essentially a local Usenet user.

I think the greatest strength of this technology would be for keeping a record of the Usenet in perpetuity. Google isn't doing a good job anymore. They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
key.

Well, there’s no reason to think IPFS will have any greater longevity than Google or a big commercial Usenet provider. Or the Internet Archive.
Indeed, I’d say a greater value than a protocol switch going forward would
be to curate the existing Usenet content at archive.org such that it can
be made more easily usable.

--
"Also . . . I can kill you with my brain."
River Tam, Trash, Firefly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Doc O'Leary on Fri Jan 7 13:16:17 2022

On 1/7/22 11:24 AM, Doc O'Leary wrote:

Indeed, I’d say a greater value than a protocol switch going forward
would be to curate the existing Usenet content at archive.org such
that it can be made more easily usable.

I wonder if it would be possible to create a batched feed that exports
articles into an archive of some sort that could be periodically
uploaded to the Internet Archive.

I see value in something like that. Though we would only need one or
two entities doing that, lest we end up with significant duplication on
the Internet Archive.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From meff@21:1/5 to Jason Evans on Sat Jan 8 06:22:08 2022

On 2022-01-07, Jason Evans <jsevans@mailfence.com> wrote:

I think the greatest strength of this technology would be for keeping a record of the Usenet in perpetuity. Google isn't doing a good job anymore. They censor message headers and you also can't download articles anymore outside of copy/paste from your browser. Independent archivists are the
key.

The problem I see is that whomever stores this article in IPFS would
still need to pin it. If other folks are interested, they could host
certain articles, but given the nature of messages on Usenet, I'm not
sure a critical mass of people would be interested in pinning an
article to Usenet. Like others in this thread have opined, I think it
makes more sense to either donate messages to the Internet Archive and
see if they would be willing to hold onto them. That, or make easy
archiving software for others to host archives.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jason Evans@21:1/5 to Grant Taylor on Sat Jan 8 17:18:36 2022

On Fri, 7 Jan 2022 13:16:17 -0700, Grant Taylor wrote:

I see value in something like that. Though we would only need one or
two entities doing that, lest we end up with significant duplication on
the Internet Archive.

I like this idea more than the IPFS idea. I'm in the process of trying to
get in touch with them to see how we can do this.

Jason

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Jason Evans on Sat Jan 8 12:42:31 2022

On 1/8/22 10:18 AM, Jason Evans wrote:

I like this idea more than the IPFS idea. I'm in the process of trying
to get in touch with them to see how we can do this.

Cool.

Please share your findings.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	37:26:07
Calls:	6,648
Files:	12,193
Messages:	5,329,133

INN DB over IPFS

Who's Online

System Info