• SunFire v120, RX MAC fifo overflow smac on Integrated Ethernet

    From Jacopo Cassinis@21:1/5 to All on Tue Nov 16 21:00:01 2021
    Dear SPARC users and developers,

    I apologise in advance if I posted something wrong, it is my first time
    posting in a Linux mailing list.

    I've been a long time user of Debian SPARC on my beloved SunFire v120,
    and so far no huge hiccups have appeared.
    However, recently I noticed that the download speeds from Apache
    (configured as a reverse proxy) were really appalling, and while troubleshooting in dmesg I saw a lot of these messages:

    [269078.153988] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00010400]
    [269078.473210] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00010400]
    [269078.687644] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00045822]
    [269078.900453] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00045822]
    [269079.167987] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00845822]
    [269079.370054] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00045822]
    [269079.577275] gem 0000:01:0c.1 enp1s12f1: RX MAC fifo overflow
    smac[00010400]

    Sometimes the network cards even hang totally, and the only way to fix
    is to either systemctl restart networking (doesn't always work) or
    rmmod gem and then modprobe gem (but sometimes it results in CPU#0 soft
    lockup and then I have to connect to the LOM and restart the whole
    machine.)

    I haven't really figured out what causes the crashes, kernel doesn't
    seem to make a difference (was running 4.19 and it crashes, now I am
    running 5.14 and it still crashes).

    It also seems related to a specific virtual host in Apache, since on
    other vhosts the download speed is fine (although I usually get the
    messages, but less often). Also the speedtest works fine.

    It appears that changing the default traffic shaping algorithm from
    fifo to fq_codel alleviates the issue (it doesn't crash as often) but
    the messages are still there.

    Anyway I don't think a messed up vhost should crash the network card,
    so I don't really know where to head next. Googling the issue reveals
    problems with gem cards, but no real solution.

    If I can help with something just ask me, I don't really know where to
    bang my head.

    Thanks
    Jacopo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)