For the past month, I have been downloading and sorting Usenet archives from >a news server (with their permission) of everything from 2003 until today.
My next step is to decide how to upload them to archive.org.
. . .
One final note. In case you're wondering, I am not archiving any binary >groups or any group that I think could get deleted because of the extremely >distasteful subject matter. I think you can get my gist about what I mean. >Everything else is here. Even the stupid spammy revenge froops.
So you'd be relying upon their indexing and its likely inability to tell
the difference between the article body, the .sig, and headers?
We've already got that. Google indexed Usenet articles as if they were
posted on the Web in the first place as the lousy Google Groups Web
interface was treated like a real Web page. Within Google Groups itself, searching became seriously hideous because Google stopped devoting staff resources to making sure the indexes were being maintained. The indexing services weren't great but they were better than what they became.
An extremely serious problem with Google Groups indexing of the article
body, when it was working, was it didn't do a great job distinguishing between the author's own text and the quoted text if it was a followup.
Usenet archives lack decent indexes. Is there a way for you to upload a
very small archive, then work on the indexing and presentation of the articles so it in some way resembles walking the thread tree? Can the
index be developed along with the archive, and then tested tested tested
to avoid another Google Groups?
So you'd be relying upon their indexing and its likely inability to tell
the difference between the article body, the .sig, and headers?
Thomas Hochstein wrote:
Adam H. Kerman schrieb:
So you'd be relying upon their indexing and its likely inability to tell >>>the difference between the article body, the .sig, and headers?
AFAIS, <https://archive.org/details/usenethistorical>has just zip'ed mbox >>archives, one per group, with no way to browse, search or index anything.
That is exactly what I have. My question is, is it better to have them on >archive.org with one entry per hierarchy or to group them like I suggested?
Adam H. Kerman schrieb:
So you'd be relying upon their indexing and its likely inability to tell >>the difference between the article body, the .sig, and headers?
AFAIS, <https://archive.org/details/usenethistorical> has just zip'ed mbox >archives, one per group, with no way to browse, search or index anything.
Adam H. Kerman schrieb:
So you'd be relying upon their indexing and its likely inability to tell
the difference between the article body, the .sig, and headers?
AFAIS, <https://archive.org/details/usenethistorical> has just zip'ed mbox archives, one per group, with no way to browse, search or index anything.
Here is the current archive that runs from the 80's and 90's until around 2003: https://archive.org/details/usenethistorical
Hi Jason,
Here is the current archive that runs from the 80's and 90's until around
2003: https://archive.org/details/usenethistorical
As noted by another person (who spoke about that archive in a French newsgroup), the encoding of bodies is wrong. All non-ASCII characters
are mungled :-/
Seen in fr.* and de.*, and I bet it is the same for all hierarchies.
The problem is that when you go back far enough, either plain ASCII is used or some non-standard encoding and then the non-English characters are
munged. My colleague, Tristan, has been doing some work on this when it
comes to this issue with Esperanto on the early Usenet.
This doesn't really answer the question that I asked in my originalI don't have a strong opinion about that. I would tend to prefer a
article about organizing Usenet hierarchies for archive.org.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 393 |
Nodes: | 16 (2 / 14) |
Uptime: | 36:10:09 |
Calls: | 8,256 |
Files: | 13,132 |
Messages: | 5,877,410 |