• Usenet map

    From U.ee@21:1/5 to All on Tue Apr 7 20:57:04 2020
    XPost: news.software.nntp

    Bonjour!

    In last few days I have tried to make visual representation about Usenet peering relations.
    Data is collected from my point of view, from Usenet.ee. There is
    missing links and nodes. Source articles are collected over few month
    period, so some peerings are probably changed in that time.
    First trouble was with source quality: my filtering was little bit too inclusive. Next time I can filter some bogus data out in beginning, so hopefully next maps are better.
    Another issue is complicated sites showing their inner workings. That
    issue is two fold:
    1) There is lot of point to point relations, complicating filtering by
    peer count.
    2) High nodecount, making map noisy.

    For lowering node count I tried few filters, one measure was only show
    nodes with at least 2 peers.


    So, map with some uninteresting leafs removed: http://usenet.ee/maps/graph3-n-20200402.png

    Little bit different cleanup: http://usenet.ee/maps/graph-n-cleaned-20200402.png

    Different representation to better readability: http://usenet.ee/maps/graph3-d-20200402.png


    I hope that readers here found this interesting!

    Best regards,
    U.ee

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to U.ee on Tue Apr 7 17:49:37 2020
    XPost: news.software.nntp

    On 4/7/20 11:57 AM, U.ee wrote:
    Bonjour!

    Hi,

    In last few days I have tried to make visual representation about Usenet peering relations.

    I did something quite similar last week.

    Data is collected from my point of view, from Usenet.ee. There is
    missing links and nodes. Source articles are collected over few month
    period, so some peerings are probably changed in that time.

    Ya. Every post is inherently a tree that connects to you. Frequently
    those trees coalesce into a graph from your systems point of view.

    First trouble was with source quality: my filtering was little bit too inclusive. Next time I can filter some bogus data out in beginning, so hopefully next maps are better.

    I'm curious what you used as your source, how you filtered, and if you modified.

    I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
    hosts, shift, and repeat.

    I added these, and labels, to a dot (Graphviz) file by way of a sort
    uniq filter to remove duplication.

    I ended up using mutated node names in the dot file, with labels of
    their actual node name. This way I didn't have to worry about dot node
    naming conventions.

    I also modified a lot of nodes to be their organization name with a sed
    script. This way, all of Google would show up as simply "Google".

    Another issue is complicated sites showing their inner workings. That
    issue is two fold:

    1) There is lot of point to point relations, complicating filtering by
    peer count.

    Yes, Usenet is inherently point to point.

    Aside: I've heard of NNTP via multi-cast, but I've never seen it used,
    much less between organizations.

    2) High nodecount, making map noisy.

    I think that's where normalizing the node name to organization name made
    a HUGE difference.

    See this tweet:

    https://twitter.com/DrScriptt/status/1245529142738575362

    In particular the follow up discussion:

    https://twitter.com/revprez/status/1245529633409417217

    For lowering node count I tried few filters, one measure was only show
    nodes with at least 2 peers.

    I'm curious how you did your filtering.

    So, map with some uninteresting leafs removed: http://usenet.ee/maps/graph3-n-20200402.png

    I wanted to avoid removing nodes.

    Or more specifically, I wanted to avoid removing organizations. Where
    one or more nodes map to an organization.

    Little bit different cleanup: http://usenet.ee/maps/graph-n-cleaned-20200402.png

    Different representation to better readability: http://usenet.ee/maps/graph3-d-20200402.png

    I saw some a crazy looping line in that graph, between
    newsfeed.CARNet.hr and feeder.erje.net. The crazy loop is below newsreader4.netcolgone.de.

    I hope that readers here found this interesting!

    I found the idea interesting enough to spend some time doing it last week.

    I'm considering a new funnel feed that goes into a filter to extract
    Path: headers of all incoming articles to collect more data. (I worked
    on a subset of just one group.)

    Best regards,

    Likewise.




    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Grant Taylor on Tue Apr 7 18:06:13 2020
    XPost: news.software.nntp

    On 4/7/20 5:49 PM, Grant Taylor wrote:
    I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
    hosts, shift, and repeat.

    I hacked together a shell script that read Path: header lines from
    STDIN, looping across each header.

    Each path header was split on the bang, and walked posting host to my
    receiving host, looping across each host.

    awk -F\! '{for (i=NF; i>1; i--)printf("%s ",$i); print $1; }'

    This allowed me to take action on each host in each path entry.

    Since I was starting with the posting / upstream host, I was able to
    iterate through the hosts in the path and deduce the upstream to
    downstream relation. This directly translated to "$upstream ->
    $downstream" entries for dot to work with.

    There was a little bit of clean up work, like priming when I don't have
    both upstream and downstream, as well as some things like not-for-mail / mail2news, etc.

    I did my node to organizational clean up with sed with things like this
    the following before running loops across them:

    s/fx01.iad.POSTED!/Google!/
    s/peer03.ams1!/Google!/



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From U.ee@21:1/5 to Grant Taylor on Wed Apr 8 12:29:29 2020
    XPost: news.software.nntp

    On 08.04.20 02:49, Grant Taylor wrote:
    On 4/7/20 11:57 AM, U.ee wrote:
    Bonjour!

    Hi,

    Tervitus! :)


    In last few days I have tried to make visual representation about
    Usenet peering relations.

    I did something quite similar last week.

    Nice!

    I'm curious what you used as your source, how you filtered, and if you modified.

    All articles from all groups since January from my server. Sorted and
    counted unique paths. Dropped everything below 10 occurrences. Some data
    was simply noise (containing something other than real paths), removed
    those too.
    Because I have played it several days or looking back more like weeks
    I have tried several things, so little bit is lost, what was exactly
    done with specific map (some crappier ones are deleted, some more
    amusing specimens are still in my archive).
    I tried remove those IP-ADDRESS.POSTED components, same with
    not-for-mail and similar.


    I took the Path: headers from my data source (10,000 articles from a newsgroup), derived connections between the upstream and downstream
    hosts, shift, and repeat.

    I added these, and labels, to a dot (Graphviz) file by way of a sort
    uniq filter to remove duplication.

    I ended up using mutated node names in the dot file, with labels of
    their actual node name.  This way I didn't have to worry about dot node naming conventions.

    I used shell tools (grep, sort, uniq; sed in one point)
    and python with pygraphviz.
    For graph3-n-20200402.png I generated dot file with python and
    "doctored" it manually to remove split clouds. Those small clusters representing some inner working for some usenet site.
    Then used neato to generate PNG file.



    1) There is lot of point to point relations, complicating filtering by
    peer count.

    Yes, Usenet is inherently point to point.

    Aside:  I've heard of NNTP via multi-cast, but I've never seen it used,
    much less between organizations.

    I probably used bad terminology here, sorry. I felt that PtP describes
    it well.
    I meant point-to-point in more general sense, one node having only one upstream, and upstream node having only one downstream, so similar for
    example wireless backhauls.


    2) High nodecount, making map noisy.

    I think that's where normalizing the node name to organization name made
    a HUGE difference.

    See this tweet:

    https://twitter.com/DrScriptt/status/1245529142738575362

    In particular the follow up discussion:

    https://twitter.com/revprez/status/1245529633409417217

    I am not well versed with twitter. Your map there looks nice!


    For lowering node count I tried few filters, one measure was only show
    nodes with at least 2 peers.

    I'm curious how you did your filtering.

    Because I used python and converted all data to python data structures (dictionaries and sets), filtering was somewhat easy. More "smartness"
    is needed though, for example in this time I simply delete specific
    keys, using hardcoded hostnames.


    So, map with some uninteresting leafs removed:
    http://usenet.ee/maps/graph3-n-20200402.png

    I wanted to avoid removing nodes.

    Or more specifically, I wanted to avoid removing organizations.  Where
    one or more nodes map to an organization.

    Yes, I removed nodes, but generally not sites (organizations).


    Different representation to better readability:
    http://usenet.ee/maps/graph3-d-20200402.png

    I saw some a crazy looping line in that graph, between
    newsfeed.CARNet.hr and feeder.erje.net.  The crazy loop is below newsreader4.netcolgone.de.

    You mean that left from news.tnib.de? Nice catch!

    I hope that readers here found this interesting!

    I found the idea interesting enough to spend some time doing it last week.

    I'm considering a new funnel feed that goes into a filter to extract
    Path: headers of all incoming articles to collect more data.  (I worked
    on a subset of just one group.)

    Some time ago I looked into newsreaders (posting agent) statistics and
    between groups there was noticeable differences. I think same is true
    with paths. Some groups are representative for some other network, for
    example fido. So, now of course question is, do you want this kind hosts
    show up in your map.

    Best wishes
    U.ee

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Tomblin@21:1/5 to U.ee on Wed Apr 8 12:56:09 2020
    XPost: news.software.nntp

    In a previous article, "U.ee" <admin@invalid.usenet.ee> said:
    In last few days I have tried to make visual representation about Usenet >peering relations.
    Data is collected from my point of view, from Usenet.ee. There is
    missing links and nodes. Source articles are collected over few month

    Years ago there was a script that many sites ran on a regular basis which sent their data to a central location that collated all that information. That had obvious advantages over just collecting the info at one node.


    --
    Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/
    ...I'm not one of those who think Bill Gates is the devil. I simply
    suspect that if Microsoft ever met up with the devil, it wouldn't need an interpreter. -- Nick Petreley

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Paul Tomblin on Wed Apr 8 09:53:14 2020
    XPost: news.software.nntp

    On 4/8/20 6:56 AM, Paul Tomblin wrote:
    Years ago there was a script that many sites ran on a regular basis
    which sent their data to a central location that collated all that information. That had obvious advantages over just collecting the
    info at one node.

    Are you referring to Top1000 or something else?

    Top1000 is still a thing.

    Aside: My main server is in the top quarter. :-)



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Karl Kleinpaste@21:1/5 to Grant Taylor on Wed Apr 8 12:10:38 2020
    XPost: news.software.nntp

    On 4/8/20 11:53 AM, Grant Taylor wrote:
    Are you referring to Top1000 or something else?

    It was Brian Reid's (then of DECWRL) Network Measurement Project.
    He once described an article flowing through the (empirically observed)
    core NNTP servers as a flare fired into a munitions dump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to U.ee on Wed Apr 8 09:50:19 2020
    XPost: news.software.nntp

    On 4/8/20 3:29 AM, U.ee wrote:
    Nice!

    I'm completely new to dot / Graphviz, so I'm still learning.

    All articles from all groups since January from my server. Sorted and
    counted unique paths. Dropped everything below 10 occurrences.

    Hum.

    I would be concerned about dropping information. I know that there are
    some newsgroups that get very low traffic, as in one post every few
    months. But that's group specific and probably not as much of an issue
    with servers with additional groups.

    Some data was simply noise (containing something other than real
    paths), removed those too.

    I'm curious to see an example of such.

    Because I have played it several days or looking back more like weeks
    I have tried several things, so little bit is lost, what was exactly
    done with specific map (some crappier ones are deleted, some more
    amusing specimens are still in my archive).

    I get it.

    I think that's more the "art" part than "science".

    I tried remove those IP-ADDRESS.POSTED components, same with
    not-for-mail and similar.

    Did you discard the entire Path? Or just those portions (to the end of
    the Path)?

    Also, IP-ADDRESS.POSTED is little different than FQDN.POSTED to me.
    They are different ways of conveying the same information. The former
    didn't have functioning reverse DNS (or it was disabled) and the latter did.

    But the posting itself is still a viable article to me.

    I used shell tools (grep, sort, uniq; sed in one point)

    I think that such tools are under appreciated.

    and python with pygraphviz.

    ACK

    For graph3-n-20200402.png I generated dot file with python and
    "doctored" it manually to remove split clouds. Those small clusters representing some inner working for some usenet site.

    That's what I used sed to normalize those nodes to org names for.

    Then used neato to generate PNG file.

    Why neato vs dot itself?

    I probably used bad terminology here, sorry. I felt that PtP describes
    it well.

    yes, point-to-point is a distinct type of connection. I believe that
    the vast majority of NNTP servers are point-to-point connected.

    I meant point-to-point in more general sense, one node having only one upstream, and upstream node having only one downstream, so similar for example wireless backhauls.

    Ah. I think you're talking about removing things that chain through
    each other without branching. E.g. remove n2 & n3 below.

    [n1]---[n2]---[n3]---[n4]

    You wanted "significant nodes" (which interconnect three or more other
    nodes). E.g. remove n2 below.

    [n5]
    |
    [n1]---[n2]---[n3]---[n4]
    |
    [n6]

    Where remove can mean collapse into the larger organization.

    I am not well versed with twitter. Your map there looks nice!

    Thank you.

    Because I used python and converted all data to python data structures (dictionaries and sets), filtering was somewhat easy.

    Hum.... The old unix admin in me has concerns about loading all of that
    data into memory. Conversely, other than sort and dot, much of what I
    did was based on streaming data through and using much less memory at
    any given time. Though, such optimizations are not necessarily as
    important these days.

    More "smartness" is needed though, for example in this time I simply
    delete specific keys, using hardcoded hostnames.

    Please elaborate on what you are deleting. What does it represent in
    the Path: header? Why are you deleting it?

    Admittedly, the sort / uniq I was doing would remove data. But it was
    data that was already represented in my data.

    Yes, I removed nodes, but generally not sites (organizations).

    ACK

    You mean that left from news.tnib.de? Nice catch!

    :-)

    It was luck. My viewer happened to zoom and show it center of the
    zoomed view.

    Tracking it was more difficult.

    Some time ago I looked into newsreaders (posting agent) statistics and between groups there was noticeable differences. I think same is true
    with paths. Some groups are representative for some other network, for example fido.

    I agree that there is quite likely — what I'm going to call — clustering
    of groups & paths to form message flows.

    Though remember Usenet's flooding nature.

    So, now of course question is, do you want this kind hosts show up
    in your map.

    I would think so.

    I'll counter with why would you not want these hosts to show up?

    They are articles that flow across Usenet.

    I guess it could be that you're mapping a specific part / subset of Usenet.

    Best wishes

    Likewise.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From U.ee@21:1/5 to Grant Taylor on Wed Apr 8 20:53:32 2020
    XPost: news.software.nntp

    Grant,

    First, thank you very much for asking those questions and explaining
    your process!

    On 08.04.20 18:50, Grant Taylor wrote:
    Hum.

    I would be concerned about dropping information.  I know that there are
    some newsgroups that get very low traffic, as in one post every few
    months.  But that's group specific and probably not as much of an issue
    with servers with additional groups.

    I drop them, because there is too much data.
    Before dropping I have 86246 lines in my unique paths file.


    Some data was simply noise (containing something other than real
    paths), removed those too.

    I'm curious to see an example of such.

    Well, I didn't limit Path: occurrences per article, so if body contained (^Path: ), it got included.
    For example:
    Path: /Applications/HouseCall.app/Contents/MacOS/HouseCall

    I tried remove those IP-ADDRESS.POSTED components, same with
    not-for-mail and similar.

    Did you discard the entire Path?  Or just those portions (to the end of
    the Path)?

    No, only that specific node. Path itself remains, shorter form.



    For graph3-n-20200402.png I generated dot file with python and
    "doctored" it manually to remove split clouds. Those small clusters
    representing some inner working for some usenet site.

    That's what I used sed to normalize those nodes to org names for.

    Then used neato to generate PNG file.

    Why neato vs dot itself?

    I have used both, in different map files.
    graph3-d-20200402.png was generated by pygraphviz using dot backend.
    I prefer neato though. This isn't very scientific reasoning, but looks
    more compact and there is feel, what is more traveled path. Those middle
    nodes look like stars middle in galaxy.



    Ah.  I think you're talking about removing things that chain through
    each other without branching.  E.g. remove n2 & n3 below.

    [n1]---[n2]---[n3]---[n4]

    You wanted "significant nodes" (which interconnect three or more other nodes).  E.g. remove n2 below.

                  [n5]
                   |
    [n1]---[n2]---[n3]---[n4]
                   |
                  [n6]

    Where remove can mean collapse into the larger organization.


    Indeed, that is good description.



    Because I used python and converted all data to python data structures
    (dictionaries and sets), filtering was somewhat easy.

    Hum....  The old unix admin in me has concerns about loading all of that data into memory.  Conversely, other than sort and dot, much of what I
    did was based on streaming data through and using much less memory at
    any given time.  Though, such optimizations are not necessarily as
    important these days.

    Actually dot/neato in end are heavy, all that before is fast and don't
    take much resources.
    I had several times neato crashing because memory starvation.
    Dot seemed more stable, but with bigger .dot file extremely slow.


    More "smartness" is needed though, for example in this time I simply
    delete specific keys, using hardcoded hostnames.

    Please elaborate on what you are deleting.  What does it represent in
    the Path: header?  Why are you deleting it?

    Single server/node. End result is sometimes removing something generic noninformative from the end (like not-for-mail), or something from
    middle (again those "same organization, different load balancer" deals).
    Paths themselves are still there, but shorter.



    Some time ago I looked into newsreaders (posting agent) statistics and
    between groups there was noticeable differences. I think same is true
    with paths. Some groups are representative for some other network, for
    example fido.

    I agree that there is quite likely — what I'm going to call — clustering of groups & paths to form message flows.

    Though remember Usenet's flooding nature.

    Those articles are flooded to every site, but found in specific groups.
    So, if you use limited groups to collect path info, you lose some more
    exotic hosts.

    So, now of course question is, do you want this kind hosts show up in
    your map.

    I would think so.

    I'll counter with why would you not want these hosts to show up?

    Because they don't represent NNTP peering relations and sometimes
    (listservers behind mail to usenet gateways) aren't even part of Usenet.
    Then again, when articles flow both directions, showing them has some merit.
    I suppose, when you map something, you need critically think, what
    exactly you are trying to represent. Same with data collection, what and
    where you are collecting, what is missing and what is excessive.


    They are articles that flow across Usenet.

    I guess it could be that you're mapping a specific part / subset of Usenet.

    Exactly, if you want map how articles themselves move, then you can
    include them.


    Best regards,
    U.ee

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Karl Kleinpaste on Wed Apr 8 14:40:00 2020
    XPost: news.software.nntp

    On 4/8/20 10:10 AM, Karl Kleinpaste wrote:
    It was Brian Reid's (then of DECWRL) Network Measurement Project.

    Hum. I'm not familiar. I'm guessing that's because was implies past
    tense and I'm mostly current tense.

    He once described an article flowing through the (empirically observed)
    core NNTP servers as a flare fired into a munitions dump.

    LOL

    That seems accurate.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Apr 8 23:33:41 2020
    XPost: news.software.nntp

    Hi,

    Tervitus! :)

    Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
    Estonia is such a very nice country! I really enjoyed it!


    In last few days I have tried to make visual representation about
    Usenet peering relations.

    I did something quite similar last week.

    In case you had not already taken a look at inpath2dot or inflow:
    https://cord.de/news-stuff
    https://ftp.isc.org/isc/inn/unoff-contrib/inflow
    you might find in these two scripts useful parsing tricks or enhancements.

    --
    Julien ÉLIE

    « Les soucis d'aujourd'hui sont les plaisanteries de demain. Rions-en
    donc tout de suite. » (Henri Béraud)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Karl Kleinpaste@21:1/5 to Grant Taylor on Wed Apr 8 17:24:26 2020
    XPost: news.software.nntp

    On 4/8/20 4:40 PM, Grant Taylor wrote:
    that's because was implies past tense

    1987-1989 or thereabouts. Not sure if it continued into the '90s.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From U.ee@21:1/5 to All on Thu Apr 9 15:15:04 2020
    XPost: news.software.nntp

    On 09.04.20 00:33, Julien ÉLIE wrote:
    Hi,

    Tervitus! :)

    Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
    Estonia is such a very nice country!  I really enjoyed it!

    That is nice to hear! I hope that you visited Viru bog in Lahemaa.


    In last few days I have tried to make visual representation about
    Usenet peering relations.

    I did something quite similar last week.

    In case you had not already taken a look at inpath2dot or inflow:
      https://cord.de/news-stuff
      https://ftp.isc.org/isc/inn/unoff-contrib/inflow
    you might find in these two scripts useful parsing tricks or enhancements.


    Thanks for those links!

    Best regards
    U.ee

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Tomblin@21:1/5 to Grant Taylor on Thu Apr 9 13:30:20 2020
    XPost: news.software.nntp

    In a previous article, Grant Taylor <gtaylor@tnetconsulting.net> said:
    On 4/8/20 6:56 AM, Paul Tomblin wrote:
    Years ago there was a script that many sites ran on a regular basis
    which sent their data to a central location that collated all that
    information. That had obvious advantages over just collecting the
    info at one node.

    Are you referring to Top1000 or something else?

    Top1000 is still a thing.

    Except the "how to participate" section still is just a bunch of "XXX add a link to this".

    Aside: My main server is in the top quarter. :-)

    I'm 153rd.


    --
    Paul Tomblin <ptomblin@xcski.com> http://blog.xcski.com/
    "Always try to do things in chronological order; it's less confusing
    that way."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Apr 19 09:28:21 2020
    XPost: news.software.nntp

    Hi,

    Oh, it reminds me of my trip to Tallinn/Lahemaa/Tartu last December.
    Estonia is such a very nice country!  I really enjoyed it!

    That is nice to hear! I hope that you visited Viru bog in Lahemaa.

    Yes! I visited Viru bog. So marvellous! I remember well that 3,5 km
    hike following a very little path surrounded by bogs. I had the chance
    to see the sunset when reaching the little tower near the middle of the
    hike. Furthermore, it snowed the morning (while driving to Lahemaa) and
    the weather was perfect during the afternoon. The hike in such snowy landscapes was fantastic for the eyes.

    This was a guided tour to Viru blog with traveller.ee; on that day, we
    also went to an old manor house and Jägala waterfall.

    --
    Julien ÉLIE

    « C'est la goutte qui fait déborder l'amphore ! »
    (Assurancetourix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)