• Re: Problem: USB Ethernet PCI-E card does not work with kernel 5.10.0-1

    From Diederik de Haas@21:1/5 to All on Wed Feb 23 17:45:49 2022
    Copy: flacusbigotis@gmail.com (Flacusbigotis)

    On Wednesday, 23 February 2022 07:14:03 CET Flacusbigotis wrote:
    Issue: The ethernet port on the Syba SD-PEX50100 PCI-E card does not work properly in Debian Bullseye (kernel 5.10.0-11-amd64).

    In Bullseye, with kernel 5.10.0-11-amd64, the ethernet card starts randomizing the MAC address which causes issues with ISPs that DHCP servers that lock-on to MAC addresses.

    NetworkManager has a MAC-address-randomization feature which you can turn off. I don't *know* but it could be that Bullseye has that feature, but Buster does not have it (enabled (by default)).
    If that's the case, then your issue would just be a SW (configuration) issue and this list would not be the right place ...

    I also think the ethernet port does not work in general in Bullseye but I am not sure how to prove that bigger claim.

    ... If the above does not apply in your case or disabling that feature does
    not resolve your issue, then it could be that it's indeed a kernel issue and then this list would be the right place ... but indirectly by reporting a bug against the kernel package (with 'reportbug').

    In Bullseye, the /var/log/messages file shows kernel logs that indicate
    that there are issues during boot up with the PCI-E card. Those logs do
    not occur at all in Debian Buster. This is why I think the issue is in the kernel.

    This and the message you posted do make it likely that it's a kernel issue.
    So here's my recommendation:
    1) Try disabling the MAC randomization and see whether that improves things.
    2) If not or it doesn't properly/fully resolve issues, use the reportbug tool/ program to file a bug against the linux-image-5.10.0-11-amd64.

    That will itself generate various info about your system, but if there's info that you posted to this list that isn't automatically included in the bug report, then do add it manually (by copy-paste f.e.).
    That way all the relevant information is present in that (one) bug report and other/outside people don't have to be aware that you also send a separate msg.

    This is my first time reporting an issue to this email list (or Debian), so
    I am not sure what other information to provide or if I need to open a bug report somewhere. So, if this is not the correct way to report the issue and/or not sufficient information to investigate it, I would appreciate
    your guidance.

    You did quite well, but the preferred way to report kernel issues (or issues
    in Debian in general) is through the Bug Tracking System (with reportbug).

    HTH,
    Diederik
    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYhZkvQAKCRDXblvOeH7b bkDyAQDi163/8rElaoHpHyFiYvgGVR1grokoRGI3doI9b+he0QD9G/Z6uubDQ1F1 ov8cMiBHbwGuYQ72+0FBdgd6Gs2SyQs=
    =aKrf
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Flacusbigotis@21:1/5 to All on Wed Feb 23 19:00:01 2022
    On Wed, Feb 23, 2022 at 10:45 AM Diederik de Haas <didi.debian@cknow.org> wrote:

    This and the message you posted do make it likely that it's a kernel issue. So here's my recommendation:
    1) Try disabling the MAC randomization and see whether that improves
    things.
    2) If not or it doesn't properly/fully resolve issues, use the reportbug tool/
    program to file a bug against the linux-image-5.10.0-11-amd64.

    That will itself generate various info about your system, but if there's
    info
    that you posted to this list that isn't automatically included in the bug report, then do add it manually (by copy-paste f.e.).
    That way all the relevant information is present in that (one) bug report
    and
    other/outside people don't have to be aware that you also send a separate msg.


    Thanks Diederik for your guidance. I will do as you suggested.

    <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 23, 2022 at 10:45 AM Diederik de Haas &lt;<a href="mailto:didi.debian@cknow.org">didi.debian@cknow.org</a>&gt; wrote:<br></div><blockquote
    class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This and the message you posted do make it likely that it&#39;s a kernel issue.<br>
    So here&#39;s my recommendation:<br>
    1) Try disabling the MAC randomization and see whether that improves things.<br>
    2) If not or it doesn&#39;t properly/fully resolve issues, use the reportbug tool/<br>
    program to file a bug against the linux-image-5.10.0-11-amd64.<br>

    That will itself generate various info about your system, but if there&#39;s info <br>
    that you posted to this list that isn&#39;t automatically included in the bug <br>
    report, then do add it manually (by copy-paste f.e.).<br>
    That way all the relevant information is present in that (one) bug report and <br>
    other/outside people don&#39;t have to be aware that you also send a separate msg.<br>
    <br><br></blockquote><div>Thanks Diederik for your guidance.  I will do as you suggested. </div><div> </div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Flacusbigotis on Tue Mar 8 11:40:01 2022
    Flacusbigotis <flacusbigotis@gmail.com> writes:

    The kernel logs indicating issues in Bullseye include a warning of a "host failure" by xhci_hcd, and several write/read errors by the ax88179 ethernet driver/module for the card, as follows:

    Feb 22 17:22:53 server1 kernel: [ 1.380198] xhci_hcd 0000:1c:00.0: xHCI Host Controller
    Feb 22 17:22:53 server1 kernel: [ 1.380205] xhci_hcd 0000:1c:00.0: new
    USB bus registered, assigned bus number 5
    Feb 22 17:22:53 server1 kernel: [ 1.380209] xhci_hcd 0000:1c:00.0: Host supports USB 3.0 SuperSpeed
    Feb 22 17:22:53 server1 kernel: [ 1.380260] usb usb5: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10
    Feb 22 17:22:53 server1 kernel: [ 1.380261] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
    Feb 22 17:22:53 server1 kernel: [ 1.380263] usb usb5: Product: xHCI Host Controller
    Feb 22 17:22:53 server1 kernel: [ 1.380264] usb usb5: Manufacturer:
    Linux 5.10.0-11-amd64 xhci-hcd
    Feb 22 17:22:53 server1 kernel: [ 1.380265] usb usb5: SerialNumber: 0000:1c:00.0
    Feb 22 17:22:53 server1 kernel: [ 1.380396] hub 5-0:1.0: USB hub found
    Feb 22 17:22:53 server1 kernel: [ 1.380411] hub 5-0:1.0: 4 ports detected Feb 22 17:22:53 server1 kernel: [ 5.508457] ax88179_178a 5-1:1.0 eth0: register 'ax88179_178a' at usb-0000:1c:00.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 00:11:22:33:44:55
    Feb 22 17:23:25 server1 kernel: [ 39.576966] xhci_hcd 0000:1c:00.0: WARNING: Host System Error
    Feb 22 17:26:00 server1 kernel: [ 194.596335] ax88179_178a 5-1:1.0 enx001122334455: Failed to read reg index 0x0002: -22

    I am guessing that the random mac address is a symptom caused by a
    failure to read the permanent mac from the USB ethernet
    controller. Which again probably is caused by one or more of these read
    errors.

    But I believe those are only symptoms, and that the real error is that unspecified "Host System Error".

    I wonder is this could be related to some of the quirks that have been
    added for this xhci controller since v4.19? There have been a few since
    the VL805 is used in the RPi4. Some of these might very well be
    misunderstood and RPI related only. There is also an odd code path in drivers/usb/host/pci-quirks.c where we select a different path on RPi
    than on other systems because "things are taken care of by the board's co-processor". I find that very suspiscious.

    And I must admit that my interest in this bug is because I'm worried
    that the quirk I recently pushed could have unexpected side effects...
    I have no clue.

    but the most likely cause is some power managenment issue. Test
    disabling ASPM e.g. by adding pcie_aspm=off to the kernel command line.

    Or disabling USB autosuspend, e.g by adding usbcore.autosuspend=-1 to the kernel command line.

    I do NOT suggest that you run with those settings by default. Only
    testing to try to narrow down the problem.

    It would also be intersting to know if removing the XHCI_LPM_SUPPORT
    quirk would make a difference, since this was added to the VL805 between
    v4.19 and v5.10 without anyone really knowing if it works.. But I can't
    figure out how to disable a device specific quirk like that without
    patching the kernel. Anyone?



    Bjørn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Flacusbigotis@21:1/5 to bjorn@mork.no on Tue Mar 8 20:40:02 2022
    Yes, I also believe it's all caused by the xhci_hcd issue.

    I did open a proper kernel issue report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006477

    You can see in that report that the card worked ok with the newest vanilla kernel. But I don't know what fixed it yet andI have not had time to do any furher investigation.

    FYI, I won't update this thread anymore, but if anyone is interested in contributing ideas, information, etc, please would you do it on the issue report itself?

    Thanks for all the help!


    On Tue, Mar 8, 2022, 4:59 AM Bjørn Mork <bjorn@mork.no> wrote:

    Flacusbigotis <flacusbigotis@gmail.com> writes:

    The kernel logs indicating issues in Bullseye include a warning of a
    "host
    failure" by xhci_hcd, and several write/read errors by the ax88179
    ethernet
    driver/module for the card, as follows:

    Feb 22 17:22:53 server1 kernel: [ 1.380198] xhci_hcd 0000:1c:00.0:
    xHCI
    Host Controller
    Feb 22 17:22:53 server1 kernel: [ 1.380205] xhci_hcd 0000:1c:00.0: new USB bus registered, assigned bus number 5
    Feb 22 17:22:53 server1 kernel: [ 1.380209] xhci_hcd 0000:1c:00.0:
    Host
    supports USB 3.0 SuperSpeed
    Feb 22 17:22:53 server1 kernel: [ 1.380260] usb usb5: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10
    Feb 22 17:22:53 server1 kernel: [ 1.380261] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
    Feb 22 17:22:53 server1 kernel: [ 1.380263] usb usb5: Product: xHCI
    Host
    Controller
    Feb 22 17:22:53 server1 kernel: [ 1.380264] usb usb5: Manufacturer: Linux 5.10.0-11-amd64 xhci-hcd
    Feb 22 17:22:53 server1 kernel: [ 1.380265] usb usb5: SerialNumber: 0000:1c:00.0
    Feb 22 17:22:53 server1 kernel: [ 1.380396] hub 5-0:1.0: USB hub found Feb 22 17:22:53 server1 kernel: [ 1.380411] hub 5-0:1.0: 4 ports
    detected
    Feb 22 17:22:53 server1 kernel: [ 5.508457] ax88179_178a 5-1:1.0 eth0: register 'ax88179_178a' at usb-0000:1c:00.0-1, ASIX AX88179 USB 3.0
    Gigabit
    Ethernet, 00:11:22:33:44:55
    Feb 22 17:23:25 server1 kernel: [ 39.576966] xhci_hcd 0000:1c:00.0: WARNING: Host System Error
    Feb 22 17:26:00 server1 kernel: [ 194.596335] ax88179_178a 5-1:1.0 enx001122334455: Failed to read reg index 0x0002: -22

    I am guessing that the random mac address is a symptom caused by a
    failure to read the permanent mac from the USB ethernet
    controller. Which again probably is caused by one or more of these read errors.

    But I believe those are only symptoms, and that the real error is that unspecified "Host System Error".

    I wonder is this could be related to some of the quirks that have been
    added for this xhci controller since v4.19? There have been a few since
    the VL805 is used in the RPi4. Some of these might very well be
    misunderstood and RPI related only. There is also an odd code path in drivers/usb/host/pci-quirks.c where we select a different path on RPi
    than on other systems because "things are taken care of by the board's co-processor". I find that very suspiscious.

    And I must admit that my interest in this bug is because I'm worried
    that the quirk I recently pushed could have unexpected side effects...
    I have no clue.

    but the most likely cause is some power managenment issue. Test
    disabling ASPM e.g. by adding pcie_aspm=off to the kernel command line.

    Or disabling USB autosuspend, e.g by adding usbcore.autosuspend=-1 to the kernel command line.

    I do NOT suggest that you run with those settings by default. Only
    testing to try to narrow down the problem.

    It would also be intersting to know if removing the XHCI_LPM_SUPPORT
    quirk would make a difference, since this was added to the VL805 between v4.19 and v5.10 without anyone really knowing if it works.. But I can't figure out how to disable a device specific quirk like that without
    patching the kernel. Anyone?



    Bjørn


    <div dir="auto"><div>Yes, I also believe it&#39;s all caused by the xhci_hcd issue.</div><div dir="auto"><br></div><div dir="auto">I did open a proper kernel issue report:  <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006477">https://
    bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006477</a></div><div dir="auto"><br></div><div dir="auto">You can see in that report that the card worked ok with the newest vanilla kernel. But I don&#39;t know what fixed it yet andI have not had time to do
    any furher investigation.</div><div dir="auto"><br></div><div dir="auto">FYI, I won&#39;t update this thread anymore, but if anyone is interested in contributing ideas, information, etc, please would you do it on the issue report itself?</div><div dir="
    auto"><br></div><div dir="auto">Thanks for all the help!</div><div dir="auto"><br><br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Tue, Mar 8, 2022, 4:59 AM Bjørn Mork &lt;<a href="mailto:bjorn@mork.no" target="_blank" rel="
    noreferrer">bjorn@mork.no</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Flacusbigotis &lt;<a href="mailto:flacusbigotis@gmail.com" rel="noreferrer noreferrer" target="_blank">
    flacusbigotis@gmail.com</a>&gt; writes:<br>

    &gt; The kernel logs indicating issues in Bullseye include a warning of a &quot;host<br>
    &gt; failure&quot; by xhci_hcd, and several write/read errors by the ax88179 ethernet<br>
    &gt; driver/module for the card, as follows:<br>
    &gt;<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380198] xhci_hcd 0000:1c:00.0: xHCI<br>
    &gt; Host Controller<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380205] xhci_hcd 0000:1c:00.0: new<br>
    &gt; USB bus registered, assigned bus number 5<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380209] xhci_hcd 0000:1c:00.0: Host<br>
    &gt; supports USB 3.0 SuperSpeed<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380260] usb usb5: New USB device<br>
    &gt; found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380261] usb usb5: New USB device<br>
    &gt; strings: Mfr=3, Product=2, SerialNumber=1<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380263] usb usb5: Product: xHCI Host<br>
    &gt; Controller<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380264] usb usb5: Manufacturer:<br>
    &gt; Linux 5.10.0-11-amd64 xhci-hcd<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380265] usb usb5: SerialNumber:<br>
    &gt; 0000:1c:00.0<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380396] hub 5-0:1.0: USB hub found<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    1.380411] hub 5-0:1.0: 4 ports detected<br>
    &gt; Feb 22 17:22:53 server1 kernel: [    5.508457] ax88179_178a 5-1:1.0 eth0:<br>
    &gt; register &#39;ax88179_178a&#39; at usb-0000:1c:00.0-1, ASIX AX88179 USB 3.0 Gigabit<br>
    &gt; Ethernet, 00:11:22:33:44:55<br>
    &gt; Feb 22 17:23:25 server1 kernel: [   39.576966] xhci_hcd 0000:1c:00.0:<br>
    &gt; WARNING: Host System Error<br>
    &gt; Feb 22 17:26:00 server1 kernel: [  194.596335] ax88179_178a 5-1:1.0<br> &gt; enx001122334455: Failed to read reg index 0x0002: -22<br>

    I am guessing that the random mac address is a symptom caused by a<br>
    failure to read the permanent mac from the USB ethernet<br>
    controller. Which again probably is caused by one or more of these read<br> errors.<br>

    But I believe those are only symptoms, and that the real error is that<br> unspecified &quot;Host System Error&quot;.<br>

    I wonder is this could be related to some of the quirks that have been<br> added for this xhci controller since v4.19?  There have been a few since<br> the VL805 is used in the RPi4. Some of these might very well be<br> misunderstood and RPI related only.  There is also an odd code path in<br> drivers/usb/host/pci-quirks.c where we select a different path on RPi<br>
    than on other systems because &quot;things are taken care of by the board&#39;s<br>
    co-processor&quot;.  I find that very suspiscious.<br>

    And I must admit that my interest in this bug is because I&#39;m worried<br> that the quirk I recently pushed could have unexpected side effects...<br>
    I have no clue.<br>

    but the most likely cause is some power managenment issue.  Test<br>
    disabling ASPM e.g. by adding pcie_aspm=off  to the kernel command line.<br>

    Or disabling USB autosuspend, e.g by adding usbcore.autosuspend=-1  to the<br> kernel command line.<br>

    I do NOT suggest that you run with those settings by default.  Only<br> testing to try to narrow down the problem.<br>

    It would also be intersting to know if removing the XHCI_LPM_SUPPORT<br>
    quirk would make a difference, since this was added to the VL805 between<br> v4.19 and v5.10 without anyone really knowing if it works..  But I can&#39;t<br>
    figure out how to disable a device specific quirk like that without<br> patching the kernel.  Anyone?<br>



    Bjørn<br>
    </blockquote></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)