Bonjour!
In last few days I have tried to make visual representation about Usenet peering relations.
Data is collected from my point of view, from Usenet.ee. There is
missing links and nodes. Source articles are collected over few month
period, so some peerings are probably changed in that time.
First trouble was with source quality: my filtering was little bit too inclusive. Next time I can filter some bogus data out in beginning, so hopefully next maps are better.
Another issue is complicated sites showing their inner workings. That
issue is two fold:
1) There is lot of point to point relations, complicating filtering by
peer count.
2) High nodecount, making map noisy.
For lowering node count I tried few filters, one measure was only show
nodes with at least 2 peers.
So, map with some uninteresting leafs removed: http://usenet.ee/maps/graph3-n-20200402.png
Little bit different cleanup: http://usenet.ee/maps/graph-n-cleaned-20200402.png
Different representation to better readability: http://usenet.ee/maps/graph3-d-20200402.png
I hope that readers here found this interesting!
Best regards,
I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
hosts, shift, and repeat.
On 4/7/20 11:57 AM, U.ee wrote:
Bonjour!
Hi,
In last few days I have tried to make visual representation about
Usenet peering relations.
I did something quite similar last week.
I'm curious what you used as your source, how you filtered, and if you modified.
I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
hosts, shift, and repeat.
I added these, and labels, to a dot (Graphviz) file by way of a sort
uniq filter to remove duplication.
I ended up using mutated node names in the dot file, with labels of
their actual node name. This way I didn't have to worry about dot node naming conventions.
1) There is lot of point to point relations, complicating filtering by
peer count.
Yes, Usenet is inherently point to point.
Aside: I've heard of NNTP via multi-cast, but I've never seen it used,
much less between organizations.
2) High nodecount, making map noisy.
I think that's where normalizing the node name to organization name made
a HUGE difference.
See this tweet:
https://twitter.com/DrScriptt/status/1245529142738575362
In particular the follow up discussion:
https://twitter.com/revprez/status/1245529633409417217
For lowering node count I tried few filters, one measure was only show
nodes with at least 2 peers.
I'm curious how you did your filtering.
So, map with some uninteresting leafs removed:
http://usenet.ee/maps/graph3-n-20200402.png
I wanted to avoid removing nodes.
Or more specifically, I wanted to avoid removing organizations. Where
one or more nodes map to an organization.
Different representation to better readability:
http://usenet.ee/maps/graph3-d-20200402.png
I saw some a crazy looping line in that graph, between
newsfeed.CARNet.hr and feeder.erje.net. The crazy loop is below newsreader4.netcolgone.de.
I hope that readers here found this interesting!
I found the idea interesting enough to spend some time doing it last week.
I'm considering a new funnel feed that goes into a filter to extract
Path: headers of all incoming articles to collect more data. (I worked
on a subset of just one group.)
In last few days I have tried to make visual representation about Usenet >peering relations.
Data is collected from my point of view, from Usenet.ee. There is
missing links and nodes. Source articles are collected over few month
Years ago there was a script that many sites ran on a regular basis
which sent their data to a central location that collated all that information. That had obvious advantages over just collecting the
info at one node.
Are you referring to Top1000 or something else?
Nice!
All articles from all groups since January from my server. Sorted and
counted unique paths. Dropped everything below 10 occurrences.
Some data was simply noise (containing something other than real
paths), removed those too.
Because I have played it several days or looking back more like weeks
I have tried several things, so little bit is lost, what was exactly
done with specific map (some crappier ones are deleted, some more
amusing specimens are still in my archive).
I tried remove those IP-ADDRESS.POSTED components, same with
not-for-mail and similar.
I used shell tools (grep, sort, uniq; sed in one point)
and python with pygraphviz.
For graph3-n-20200402.png I generated dot file with python and
"doctored" it manually to remove split clouds. Those small clusters representing some inner working for some usenet site.
Then used neato to generate PNG file.
I probably used bad terminology here, sorry. I felt that PtP describes
it well.
I meant point-to-point in more general sense, one node having only one upstream, and upstream node having only one downstream, so similar for example wireless backhauls.
I am not well versed with twitter. Your map there looks nice!
Because I used python and converted all data to python data structures (dictionaries and sets), filtering was somewhat easy.
More "smartness" is needed though, for example in this time I simply
delete specific keys, using hardcoded hostnames.
Yes, I removed nodes, but generally not sites (organizations).
You mean that left from news.tnib.de? Nice catch!
Some time ago I looked into newsreaders (posting agent) statistics and between groups there was noticeable differences. I think same is true
with paths. Some groups are representative for some other network, for example fido.
So, now of course question is, do you want this kind hosts show up
in your map.
Best wishes
Hum.
I would be concerned about dropping information. I know that there are
some newsgroups that get very low traffic, as in one post every few
months. But that's group specific and probably not as much of an issue
with servers with additional groups.
Some data was simply noise (containing something other than real
paths), removed those too.
I'm curious to see an example of such.
I tried remove those IP-ADDRESS.POSTED components, same with
not-for-mail and similar.
Did you discard the entire Path? Or just those portions (to the end of
the Path)?
For graph3-n-20200402.png I generated dot file with python and
"doctored" it manually to remove split clouds. Those small clusters
representing some inner working for some usenet site.
That's what I used sed to normalize those nodes to org names for.
Then used neato to generate PNG file.
Why neato vs dot itself?
Ah. I think you're talking about removing things that chain through
each other without branching. E.g. remove n2 & n3 below.
[n1]---[n2]---[n3]---[n4]
You wanted "significant nodes" (which interconnect three or more other nodes). E.g. remove n2 below.
[n5]
|
[n1]---[n2]---[n3]---[n4]
|
[n6]
Where remove can mean collapse into the larger organization.
Because I used python and converted all data to python data structures
(dictionaries and sets), filtering was somewhat easy.
Hum.... The old unix admin in me has concerns about loading all of that data into memory. Conversely, other than sort and dot, much of what I
did was based on streaming data through and using much less memory at
any given time. Though, such optimizations are not necessarily as
important these days.
More "smartness" is needed though, for example in this time I simply
delete specific keys, using hardcoded hostnames.
Please elaborate on what you are deleting. What does it represent in
the Path: header? Why are you deleting it?
Some time ago I looked into newsreaders (posting agent) statistics and
between groups there was noticeable differences. I think same is true
with paths. Some groups are representative for some other network, for
example fido.
I agree that there is quite likely — what I'm going to call — clustering of groups & paths to form message flows.
Though remember Usenet's flooding nature.
So, now of course question is, do you want this kind hosts show up in
your map.
I would think so.
I'll counter with why would you not want these hosts to show up?
They are articles that flow across Usenet.
I guess it could be that you're mapping a specific part / subset of Usenet.
It was Brian Reid's (then of DECWRL) Network Measurement Project.
He once described an article flowing through the (empirically observed)
core NNTP servers as a flare fired into a munitions dump.
Tervitus! :)
In last few days I have tried to make visual representation about
Usenet peering relations.
I did something quite similar last week.
that's because was implies past tense
Hi,
Tervitus! :)
Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
Estonia is such a very nice country! I really enjoyed it!
In last few days I have tried to make visual representation about
Usenet peering relations.
I did something quite similar last week.
In case you had not already taken a look at inpath2dot or inflow:
https://cord.de/news-stuff
https://ftp.isc.org/isc/inn/unoff-contrib/inflow
you might find in these two scripts useful parsing tricks or enhancements.
On 4/8/20 6:56 AM, Paul Tomblin wrote:
Years ago there was a script that many sites ran on a regular basis
which sent their data to a central location that collated all that
information. That had obvious advantages over just collecting the
info at one node.
Are you referring to Top1000 or something else?
Top1000 is still a thing.
Aside: My main server is in the top quarter. :-)
Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
Estonia is such a very nice country! I really enjoyed it!
That is nice to hear! I hope that you visited Viru bog in Lahemaa.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 380 |
Nodes: | 16 (2 / 14) |
Uptime: | 52:02:22 |
Calls: | 8,144 |
Calls today: | 7 |
Files: | 13,085 |
Messages: | 5,858,736 |