Forum: >>> Magnum BBS <<<

Usenet map

From U.ee@21:1/5 to All on Tue Apr 7 20:57:04 2020

XPost: news.software.nntp

Bonjour!

In last few days I have tried to make visual representation about Usenet peering relations.
Data is collected from my point of view, from Usenet.ee. There is
missing links and nodes. Source articles are collected over few month
period, so some peerings are probably changed in that time.
First trouble was with source quality: my filtering was little bit too inclusive. Next time I can filter some bogus data out in beginning, so hopefully next maps are better.
Another issue is complicated sites showing their inner workings. That
issue is two fold:
1) There is lot of point to point relations, complicating filtering by
peer count.
2) High nodecount, making map noisy.

For lowering node count I tried few filters, one measure was only show
nodes with at least 2 peers.

So, map with some uninteresting leafs removed: http://usenet.ee/maps/graph3-n-20200402.png

Little bit different cleanup: http://usenet.ee/maps/graph-n-cleaned-20200402.png

Different representation to better readability: http://usenet.ee/maps/graph3-d-20200402.png

I hope that readers here found this interesting!

Best regards,
U.ee

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to U.ee on Tue Apr 7 17:49:37 2020

XPost: news.software.nntp

On 4/7/20 11:57 AM, U.ee wrote:

Bonjour!

Hi,

In last few days I have tried to make visual representation about Usenet peering relations.

I did something quite similar last week.

Data is collected from my point of view, from Usenet.ee. There is
missing links and nodes. Source articles are collected over few month
period, so some peerings are probably changed in that time.

Ya. Every post is inherently a tree that connects to you. Frequently
those trees coalesce into a graph from your systems point of view.

First trouble was with source quality: my filtering was little bit too inclusive. Next time I can filter some bogus data out in beginning, so hopefully next maps are better.

I'm curious what you used as your source, how you filtered, and if you modified.

I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
hosts, shift, and repeat.

I added these, and labels, to a dot (Graphviz) file by way of a sort
uniq filter to remove duplication.

I ended up using mutated node names in the dot file, with labels of
their actual node name. This way I didn't have to worry about dot node
naming conventions.

I also modified a lot of nodes to be their organization name with a sed
script. This way, all of Google would show up as simply "Google".

Another issue is complicated sites showing their inner workings. That
issue is two fold:

1) There is lot of point to point relations, complicating filtering by
peer count.

Yes, Usenet is inherently point to point.

Aside: I've heard of NNTP via multi-cast, but I've never seen it used,
much less between organizations.

2) High nodecount, making map noisy.

I think that's where normalizing the node name to organization name made
a HUGE difference.

See this tweet:

https://twitter.com/DrScriptt/status/1245529142738575362

In particular the follow up discussion:

https://twitter.com/revprez/status/1245529633409417217

For lowering node count I tried few filters, one measure was only show
nodes with at least 2 peers.

I'm curious how you did your filtering.

So, map with some uninteresting leafs removed: http://usenet.ee/maps/graph3-n-20200402.png

I wanted to avoid removing nodes.

Or more specifically, I wanted to avoid removing organizations. Where
one or more nodes map to an organization.

Little bit different cleanup: http://usenet.ee/maps/graph-n-cleaned-20200402.png

Different representation to better readability: http://usenet.ee/maps/graph3-d-20200402.png

I saw some a crazy looping line in that graph, between
newsfeed.CARNet.hr and feeder.erje.net. The crazy loop is below newsreader4.netcolgone.de.

I hope that readers here found this interesting!

I found the idea interesting enough to spend some time doing it last week.

I'm considering a new funnel feed that goes into a filter to extract
Path: headers of all incoming articles to collect more data. (I worked
on a subset of just one group.)

Best regards,

Likewise.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Grant Taylor on Tue Apr 7 18:06:13 2020

XPost: news.software.nntp

On 4/7/20 5:49 PM, Grant Taylor wrote:

I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
hosts, shift, and repeat.

I hacked together a shell script that read Path: header lines from
STDIN, looping across each header.

Each path header was split on the bang, and walked posting host to my
receiving host, looping across each host.

awk -F\! '{for (i=NF; i>1; i--)printf("%s ",$i); print $1; }'

This allowed me to take action on each host in each path entry.

Since I was starting with the posting / upstream host, I was able to
iterate through the hosts in the path and deduce the upstream to
downstream relation. This directly translated to "$upstream ->
$downstream" entries for dot to work with.

There was a little bit of clean up work, like priming when I don't have
both upstream and downstream, as well as some things like not-for-mail / mail2news, etc.

I did my node to organizational clean up with sed with things like this
the following before running loops across them:

s/fx01.iad.POSTED!/Google!/
s/peer03.ams1!/Google!/

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From U.ee@21:1/5 to Grant Taylor on Wed Apr 8 12:29:29 2020

XPost: news.software.nntp

On 08.04.20 02:49, Grant Taylor wrote:

On 4/7/20 11:57 AM, U.ee wrote:

Bonjour!

Hi,

Tervitus! :)

In last few days I have tried to make visual representation about
Usenet peering relations.

I did something quite similar last week.

Nice!

I'm curious what you used as your source, how you filtered, and if you modified.

All articles from all groups since January from my server. Sorted and
counted unique paths. Dropped everything below 10 occurrences. Some data
was simply noise (containing something other than real paths), removed
those too.
Because I have played it several days or looking back more like weeks
I have tried several things, so little bit is lost, what was exactly
done with specific map (some crappier ones are deleted, some more
amusing specimens are still in my archive).
I tried remove those IP-ADDRESS.POSTED components, same with
not-for-mail and similar.

I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
hosts, shift, and repeat.

I added these, and labels, to a dot (Graphviz) file by way of a sort
uniq filter to remove duplication.

I ended up using mutated node names in the dot file, with labels of
their actual node name. This way I didn't have to worry about dot node naming conventions.

I used shell tools (grep, sort, uniq; sed in one point)
and python with pygraphviz.
For graph3-n-20200402.png I generated dot file with python and
"doctored" it manually to remove split clouds. Those small clusters representing some inner working for some usenet site.
Then used neato to generate PNG file.

1) There is lot of point to point relations, complicating filtering by
peer count.

Yes, Usenet is inherently point to point.

Aside: I've heard of NNTP via multi-cast, but I've never seen it used,
much less between organizations.

I probably used bad terminology here, sorry. I felt that PtP describes
it well.
I meant point-to-point in more general sense, one node having only one upstream, and upstream node having only one downstream, so similar for
example wireless backhauls.

2) High nodecount, making map noisy.

I think that's where normalizing the node name to organization name made
a HUGE difference.

See this tweet:

https://twitter.com/DrScriptt/status/1245529142738575362

In particular the follow up discussion:

https://twitter.com/revprez/status/1245529633409417217

I am not well versed with twitter. Your map there looks nice!

For lowering node count I tried few filters, one measure was only show
nodes with at least 2 peers.

I'm curious how you did your filtering.

Because I used python and converted all data to python data structures (dictionaries and sets), filtering was somewhat easy. More "smartness"
is needed though, for example in this time I simply delete specific
keys, using hardcoded hostnames.

So, map with some uninteresting leafs removed:
http://usenet.ee/maps/graph3-n-20200402.png

I wanted to avoid removing nodes.

Or more specifically, I wanted to avoid removing organizations. Where
one or more nodes map to an organization.

Yes, I removed nodes, but generally not sites (organizations).

Different representation to better readability:
http://usenet.ee/maps/graph3-d-20200402.png

I saw some a crazy looping line in that graph, between
newsfeed.CARNet.hr and feeder.erje.net. The crazy loop is below newsreader4.netcolgone.de.

You mean that left from news.tnib.de? Nice catch!

I hope that readers here found this interesting!

I found the idea interesting enough to spend some time doing it last week.

I'm considering a new funnel feed that goes into a filter to extract
Path: headers of all incoming articles to collect more data. (I worked
on a subset of just one group.)

Some time ago I looked into newsreaders (posting agent) statistics and
between groups there was noticeable differences. I think same is true
with paths. Some groups are representative for some other network, for
example fido. So, now of course question is, do you want this kind hosts
show up in your map.

Best wishes
U.ee

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Tomblin@21:1/5 to U.ee on Wed Apr 8 12:56:09 2020

XPost: news.software.nntp

In a previous article, "U.ee" <admin@invalid.usenet.ee> said:

In last few days I have tried to make visual representation about Usenet >peering relations.
Data is collected from my point of view, from Usenet.ee. There is
missing links and nodes. Source articles are collected over few month

Years ago there was a script that many sites ran on a regular basis which sent their data to a central location that collated all that information. That had obvious advantages over just collecting the info at one node.

--
Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/
...I'm not one of those who think Bill Gates is the devil. I simply
suspect that if Microsoft ever met up with the devil, it wouldn't need an interpreter. -- Nick Petreley

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Paul Tomblin on Wed Apr 8 09:53:14 2020

XPost: news.software.nntp

On 4/8/20 6:56 AM, Paul Tomblin wrote:

Years ago there was a script that many sites ran on a regular basis
which sent their data to a central location that collated all that information. That had obvious advantages over just collecting the
info at one node.

Are you referring to Top1000 or something else?

Top1000 is still a thing.

Aside: My main server is in the top quarter. :-)

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Karl Kleinpaste@21:1/5 to Grant Taylor on Wed Apr 8 12:10:38 2020

XPost: news.software.nntp

On 4/8/20 11:53 AM, Grant Taylor wrote:

Are you referring to Top1000 or something else?

It was Brian Reid's (then of DECWRL) Network Measurement Project.
He once described an article flowing through the (empirically observed)
core NNTP servers as a flare fired into a munitions dump.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to U.ee on Wed Apr 8 09:50:19 2020

XPost: news.software.nntp

On 4/8/20 3:29 AM, U.ee wrote:

Nice!

I'm completely new to dot / Graphviz, so I'm still learning.

All articles from all groups since January from my server. Sorted and
counted unique paths. Dropped everything below 10 occurrences.

Hum.

I would be concerned about dropping information. I know that there are
some newsgroups that get very low traffic, as in one post every few
months. But that's group specific and probably not as much of an issue
with servers with additional groups.

Some data was simply noise (containing something other than real
paths), removed those too.

I'm curious to see an example of such.

Because I have played it several days or looking back more like weeks
I have tried several things, so little bit is lost, what was exactly
done with specific map (some crappier ones are deleted, some more
amusing specimens are still in my archive).

I get it.

I think that's more the "art" part than "science".

I tried remove those IP-ADDRESS.POSTED components, same with
not-for-mail and similar.

Did you discard the entire Path? Or just those portions (to the end of
the Path)?

Also, IP-ADDRESS.POSTED is little different than FQDN.POSTED to me.
They are different ways of conveying the same information. The former
didn't have functioning reverse DNS (or it was disabled) and the latter did.

But the posting itself is still a viable article to me.

I used shell tools (grep, sort, uniq; sed in one point)

I think that such tools are under appreciated.

and python with pygraphviz.

ACK

For graph3-n-20200402.png I generated dot file with python and
"doctored" it manually to remove split clouds. Those small clusters representing some inner working for some usenet site.

That's what I used sed to normalize those nodes to org names for.

Then used neato to generate PNG file.

Why neato vs dot itself?

I probably used bad terminology here, sorry. I felt that PtP describes
it well.

yes, point-to-point is a distinct type of connection. I believe that
the vast majority of NNTP servers are point-to-point connected.

I meant point-to-point in more general sense, one node having only one upstream, and upstream node having only one downstream, so similar for example wireless backhauls.

Ah. I think you're talking about removing things that chain through
each other without branching. E.g. remove n2 & n3 below.

[n1]---[n2]---[n3]---[n4]

You wanted "significant nodes" (which interconnect three or more other
nodes). E.g. remove n2 below.

[n5]
|
[n1]---[n2]---[n3]---[n4]
|
[n6]

Where remove can mean collapse into the larger organization.

I am not well versed with twitter. Your map there looks nice!

Thank you.

Because I used python and converted all data to python data structures (dictionaries and sets), filtering was somewhat easy.

Hum.... The old unix admin in me has concerns about loading all of that
data into memory. Conversely, other than sort and dot, much of what I
did was based on streaming data through and using much less memory at
any given time. Though, such optimizations are not necessarily as
important these days.

More "smartness" is needed though, for example in this time I simply
delete specific keys, using hardcoded hostnames.

Please elaborate on what you are deleting. What does it represent in
the Path: header? Why are you deleting it?

Admittedly, the sort / uniq I was doing would remove data. But it was
data that was already represented in my data.

Yes, I removed nodes, but generally not sites (organizations).

ACK

You mean that left from news.tnib.de? Nice catch!

:-)

It was luck. My viewer happened to zoom and show it center of the
zoomed view.

Tracking it was more difficult.

Some time ago I looked into newsreaders (posting agent) statistics and between groups there was noticeable differences. I think same is true
with paths. Some groups are representative for some other network, for example fido.

I agree that there is quite likely — what I'm going to call — clustering
of groups & paths to form message flows.

Though remember Usenet's flooding nature.

So, now of course question is, do you want this kind hosts show up
in your map.

I would think so.

I'll counter with why would you not want these hosts to show up?

They are articles that flow across Usenet.

I guess it could be that you're mapping a specific part / subset of Usenet.

Best wishes

Likewise.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From U.ee@21:1/5 to Grant Taylor on Wed Apr 8 20:53:32 2020

XPost: news.software.nntp

Grant,

First, thank you very much for asking those questions and explaining
your process!

On 08.04.20 18:50, Grant Taylor wrote:

Hum.

I would be concerned about dropping information. I know that there are
some newsgroups that get very low traffic, as in one post every few
months. But that's group specific and probably not as much of an issue
with servers with additional groups.

I drop them, because there is too much data.
Before dropping I have 86246 lines in my unique paths file.

Some data was simply noise (containing something other than real
paths), removed those too.

I'm curious to see an example of such.

Well, I didn't limit Path: occurrences per article, so if body contained (^Path: ), it got included.
For example:
Path: /Applications/HouseCall.app/Contents/MacOS/HouseCall

I tried remove those IP-ADDRESS.POSTED components, same with
not-for-mail and similar.

Did you discard the entire Path? Or just those portions (to the end of
the Path)?

No, only that specific node. Path itself remains, shorter form.

For graph3-n-20200402.png I generated dot file with python and
"doctored" it manually to remove split clouds. Those small clusters
representing some inner working for some usenet site.

That's what I used sed to normalize those nodes to org names for.

Then used neato to generate PNG file.

Why neato vs dot itself?

I have used both, in different map files.
graph3-d-20200402.png was generated by pygraphviz using dot backend.
I prefer neato though. This isn't very scientific reasoning, but looks
more compact and there is feel, what is more traveled path. Those middle
nodes look like stars middle in galaxy.

Ah. I think you're talking about removing things that chain through
each other without branching. E.g. remove n2 & n3 below.

[n1]---[n2]---[n3]---[n4]

You wanted "significant nodes" (which interconnect three or more other nodes). E.g. remove n2 below.

              [n5]
               |
[n1]---[n2]---[n3]---[n4]
               |
              [n6]

Where remove can mean collapse into the larger organization.

Indeed, that is good description.

Because I used python and converted all data to python data structures
(dictionaries and sets), filtering was somewhat easy.

Hum.... The old unix admin in me has concerns about loading all of that data into memory. Conversely, other than sort and dot, much of what I
did was based on streaming data through and using much less memory at
any given time. Though, such optimizations are not necessarily as
important these days.

Actually dot/neato in end are heavy, all that before is fast and don't
take much resources.
I had several times neato crashing because memory starvation.
Dot seemed more stable, but with bigger .dot file extremely slow.

More "smartness" is needed though, for example in this time I simply
delete specific keys, using hardcoded hostnames.

Please elaborate on what you are deleting. What does it represent in
the Path: header? Why are you deleting it?

Single server/node. End result is sometimes removing something generic noninformative from the end (like not-for-mail), or something from
middle (again those "same organization, different load balancer" deals).
Paths themselves are still there, but shorter.

Some time ago I looked into newsreaders (posting agent) statistics and
between groups there was noticeable differences. I think same is true
with paths. Some groups are representative for some other network, for
example fido.

I agree that there is quite likely — what I'm going to call — clustering of groups & paths to form message flows.

Though remember Usenet's flooding nature.

Those articles are flooded to every site, but found in specific groups.
So, if you use limited groups to collect path info, you lose some more
exotic hosts.

So, now of course question is, do you want this kind hosts show up in
your map.

I would think so.

I'll counter with why would you not want these hosts to show up?

Because they don't represent NNTP peering relations and sometimes
(listservers behind mail to usenet gateways) aren't even part of Usenet.
Then again, when articles flow both directions, showing them has some merit.
I suppose, when you map something, you need critically think, what
exactly you are trying to represent. Same with data collection, what and
where you are collecting, what is missing and what is excessive.

They are articles that flow across Usenet.

I guess it could be that you're mapping a specific part / subset of Usenet.

Exactly, if you want map how articles themselves move, then you can
include them.

Best regards,
U.ee

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to Karl Kleinpaste on Wed Apr 8 14:40:00 2020

XPost: news.software.nntp

On 4/8/20 10:10 AM, Karl Kleinpaste wrote:

It was Brian Reid's (then of DECWRL) Network Measurement Project.

Hum. I'm not familiar. I'm guessing that's because was implies past
tense and I'm mostly current tense.

He once described an article flowing through the (empirically observed)
core NNTP servers as a flare fired into a munitions dump.

LOL

That seems accurate.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Apr 8 23:33:41 2020

XPost: news.software.nntp

Hi,

Tervitus! :)

Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
Estonia is such a very nice country! I really enjoyed it!

In last few days I have tried to make visual representation about
Usenet peering relations.

I did something quite similar last week.

In case you had not already taken a look at inpath2dot or inflow:
https://cord.de/news-stuff
https://ftp.isc.org/isc/inn/unoff-contrib/inflow
you might find in these two scripts useful parsing tricks or enhancements.

--
Julien ÉLIE

« Les soucis d'aujourd'hui sont les plaisanteries de demain. Rions-en
donc tout de suite. » (Henri Béraud)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Karl Kleinpaste@21:1/5 to Grant Taylor on Wed Apr 8 17:24:26 2020

XPost: news.software.nntp

On 4/8/20 4:40 PM, Grant Taylor wrote:

that's because was implies past tense

1987-1989 or thereabouts. Not sure if it continued into the '90s.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From U.ee@21:1/5 to All on Thu Apr 9 15:15:04 2020

XPost: news.software.nntp

On 09.04.20 00:33, Julien ÉLIE wrote:

Hi,

Tervitus! :)

Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
Estonia is such a very nice country! I really enjoyed it!

That is nice to hear! I hope that you visited Viru bog in Lahemaa.

In last few days I have tried to make visual representation about
Usenet peering relations.

I did something quite similar last week.

In case you had not already taken a look at inpath2dot or inflow:
https://cord.de/news-stuff
https://ftp.isc.org/isc/inn/unoff-contrib/inflow
you might find in these two scripts useful parsing tricks or enhancements.

Thanks for those links!

Best regards
U.ee

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Tomblin@21:1/5 to Grant Taylor on Thu Apr 9 13:30:20 2020

XPost: news.software.nntp

In a previous article, Grant Taylor <gtaylor@tnetconsulting.net> said:

On 4/8/20 6:56 AM, Paul Tomblin wrote:

Years ago there was a script that many sites ran on a regular basis
which sent their data to a central location that collated all that
information. That had obvious advantages over just collecting the
info at one node.

Are you referring to Top1000 or something else?

Top1000 is still a thing.

Except the "how to participate" section still is just a bunch of "XXX add a link to this".

Aside: My main server is in the top quarter. :-)

I'm 153rd.

--
Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/
"Always try to do things in chronological order; it's less confusing
that way."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Apr 19 09:28:21 2020

XPost: news.software.nntp

Hi,

Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
Estonia is such a very nice country! I really enjoyed it!

That is nice to hear! I hope that you visited Viru bog in Lahemaa.

Yes! I visited Viru bog. So marvellous! I remember well that 3,5 km
hike following a very little path surrounded by bogs. I had the chance
to see the sunset when reaching the little tower near the middle of the
hike. Furthermore, it snowed the morning (while driving to Lahemaa) and
the weather was perfect during the afternoon. The hike in such snowy landscapes was fantastic for the eyes.

This was a guided tour to Viru blog with traveller.ee; on that day, we
also went to an old manor house and Jägala waterfall.

--
Julien ÉLIE

« C'est la goutte qui fait déborder l'amphore ! »
(Assurancetourix)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Thu Dec 26 23:30:08 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Bob Worm
  Thu Dec 26 22:11:54 2024
  from Wales, Uk via Telnet
- Mastermind
  Thu Dec 26 19:39:57 2024
  from Away, Fromhere via Telnet
- Guest
  Thu Dec 26 05:34:50 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 05:25:03 2024
  from Sydney, Nsw via Telnet
- Guest
  Thu Dec 26 04:02:03 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 00:08:06 2024
  from Sydney, Nsw via Telnet
- Bob Worm
  Wed Dec 25 23:09:42 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	380
Nodes:	16 (2 / 14)
Uptime:	52:02:22
Calls:	8,144
Calls today:	7
Files:	13,085
Messages:	5,858,736

Usenet map

Who's Online

Recent Visitors

System Info