• Debian/Xen on ARM: How to identify source of an unhandled SMC call duri

    From Paul Leiber@21:1/5 to All on Thu Jan 25 22:50:01 2024
    Dear Debian user list members,

    I am trying to run network related stuff (Samba, Zabbix) on a Raspberry
    Pi 4B in a virtualized environment using Debian Bookworm and Xen. I am
    running into reproducible complete system crashes/reboots due to a Xen
    watchdog triggering under certain, seemingly strange conditions (the
    number of VLANs involved seems to play a role, running tcpdump on
    certain interfaces prevents this issue, ...). If you are interested in
    the long version, you can find it here [1].

    Some people on xen-devel pointed out to me two unhandled SMC calls in
    the boot logs which could be the root of the problem. I am now trying to
    find out where these calls come from to get closer to the root cause.
    The suspected calls are the following ones:

    (XEN) d0v0 Unhandled SMC/HVC: 0x84000050
    (XEN) d0v0 Unhandled SMC/HVC: 0x8600ff01

    These calls happen during the Dom0 boot process, so it's something from
    inside Linux and nothing Xen related, I've been told. The current
    working hypothesis is that the calls are trying to find some module not emulated by Xen and are therefore failing, leading to Linux waiting for
    the reply, and subsequently to the Xen watchdog triggering and rebooting.

    From what I could find out in ARM documentation, the unhandled SMC
    calls probably have the following purpose:

    0x84000050 = TRNG_VERSION, returns the implemented TRNG (True Random
    Number Generator) ABI version [2]
    0x8600ff01 = Call UID Query for Vendor Specific Hypervisor Service,
    Returns a unique identifier of the service provider [3]

    The more likely cause is the second call to the address 0x8600ff01.

    Now I simply have no idea how to find out where in the Linux boot
    process these calls are made. I tried poking into the Linux sources a
    bit, and I couldn't find an exact match for these call addresses, so I
    assume these addresses are assembled from different parts. There are
    some matches for "0x8600" and for "ff01", but I couldn't identify if
    these matches are relevant.

    I tried to find out if strace could help, but from what I understand,
    this is related to commands coming from userspace, so I am not sure that
    strace helps during the boot process.

    I'd appreciate it if somebody more knowledgeable would point me in the
    right direction. If more information is needed, I can provide it.

    Thanks,

    Paul

    [1]
    https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00796.html
    [2] https://developer.arm.com/documentation/den0098/latest/
    [3] https://documentation-service.arm.com/static/628b755ce3c4322a76af56de?token=

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Leiber@21:1/5 to All on Wed Jan 31 08:10:01 2024
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:
    Dear Debian user list members,

    I am trying to run network related stuff (Samba, Zabbix) on a Raspberry
    Pi 4B in a virtualized environment using Debian Bookworm and Xen. I am running into reproducible complete system crashes/reboots due to a Xen watchdog triggering under certain, seemingly strange conditions (the
    number of VLANs involved seems to play a role, running tcpdump on
    certain interfaces prevents this issue, ...). If you are interested in
    the long version, you can find it here [1].

    Some people on xen-devel pointed out to me two unhandled SMC calls in
    the boot logs which could be the root of the problem. I am now trying to
    find out where these calls come from to get closer to the root cause.
    The suspected calls are the following ones:

    (XEN) d0v0 Unhandled SMC/HVC: 0x84000050
    (XEN) d0v0 Unhandled SMC/HVC: 0x8600ff01

    These calls happen during the Dom0 boot process, so it's something from inside Linux and nothing Xen related, I've been told. The current
    working hypothesis is that the calls are trying to find some module not emulated by Xen and are therefore failing, leading to Linux waiting for
    the reply, and subsequently to the Xen watchdog triggering and rebooting.

    From what I could find out in ARM documentation, the unhandled SMC
    calls probably have the following purpose:

    0x84000050 = TRNG_VERSION, returns the implemented TRNG (True Random
    Number Generator) ABI version [2]
    0x8600ff01 = Call UID Query for Vendor Specific Hypervisor Service,
    Returns a unique identifier of the service provider [3]

    The more likely cause is the second call to the address 0x8600ff01.

    Now I simply have no idea how to find out where in the Linux boot
    process these calls are made. I tried poking into the Linux sources a
    bit, and I couldn't find an exact match for these call addresses, so I
    assume these addresses are assembled from different parts. There are
    some matches for "0x8600" and for "ff01", but I couldn't identify if
    these matches are relevant.

    I tried to find out if strace could help, but from what I understand,
    this is related to commands coming from userspace, so I am not sure that strace helps during the boot process.

    I'd appreciate it if somebody more knowledgeable would point me in the
    right direction. If more information is needed, I can provide it.

    Thanks,

    Paul

    [1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00796.html [2] https://developer.arm.com/documentation/den0098/latest/
    [3] https://documentation-service.arm.com/static/628b755ce3c4322a76af56de?token=


    Hm, no reply so far. Is this maybe the wrong list? Should I post this
    rather somewhere else?

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrew M.A. Cater@21:1/5 to Paul Leiber on Wed Jan 31 19:10:01 2024
    On Wed, Jan 31, 2024 at 08:02:47AM +0100, Paul Leiber wrote:
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:

    Paul

    [1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00796.html [2] https://developer.arm.com/documentation/den0098/latest/
    [3] https://documentation-service.arm.com/static/628b755ce3c4322a76af56de?token=


    Hm, no reply so far. Is this maybe the wrong list? Should I post this rather somewhere else?

    Paul


    debian-arm / OFTC IRC #debian-arm or #debian-raspberrypi

    Xen in Debian is team maintained, I think, but many people have moved
    away from Xen in favour of other virtualisation/paravirtualisation
    solutions and containers.

    All the very best, as ever,

    Andy
    (amacater@debian.org)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From hw@21:1/5 to Paul Leiber on Wed Jan 31 22:10:01 2024
    On Wed, 2024-01-31 at 08:02 +0100, Paul Leiber wrote:
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:
    Dear Debian user list members,

    I am trying to run network related stuff (Samba, Zabbix) on a Raspberry
    Pi 4B in a virtualized environment using Debian Bookworm and Xen. I am running into reproducible complete system crashes/reboots due to a Xen watchdog triggering under certain, seemingly strange conditions (the

    Raspberry Pis have a watchdog?

    Maybe disable the watchdog and see what happens?

    number of VLANs involved seems to play a role, running tcpdump on
    certain interfaces prevents this issue, ...). If you are interested in
    the long version, you can find it here [1].

    Some people on xen-devel pointed out to me two unhandled SMC calls in
    the boot logs which could be the root of the problem. I am now trying to find out where these calls come from to get closer to the root cause.
    The suspected calls are the following ones:

    (XEN) d0v0 Unhandled SMC/HVC: 0x84000050
    (XEN) d0v0 Unhandled SMC/HVC: 0x8600ff01

    These calls happen during the Dom0 boot process, so it's something from inside Linux and nothing Xen related, I've been told. The current
    working hypothesis is that the calls are trying to find some module not emulated by Xen and are therefore failing, leading to Linux waiting for the reply, and subsequently to the Xen watchdog triggering and rebooting.

    From what I could find out in ARM documentation, the unhandled SMC
    calls probably have the following purpose:

    0x84000050 = TRNG_VERSION, returns the implemented TRNG (True Random Number Generator) ABI version [2]
    0x8600ff01 = Call UID Query for Vendor Specific Hypervisor Service, Returns a unique identifier of the service provider [3]

    The more likely cause is the second call to the address 0x8600ff01.

    Now I simply have no idea how to find out where in the Linux boot
    process these calls are made. I tried poking into the Linux sources a
    bit, and I couldn't find an exact match for these call addresses, so I assume these addresses are assembled from different parts. There are
    some matches for "0x8600" and for "ff01", but I couldn't identify if
    these matches are relevant.

    I tried to find out if strace could help, but from what I understand,
    this is related to commands coming from userspace, so I am not sure that strace helps during the boot process.

    I'd appreciate it if somebody more knowledgeable would point me in the right direction. If more information is needed, I can provide it.

    I would search for the message 'Unhandled SMC/HVC' itself, or even for 'Unhandled', not for the address. The address is probably determined
    at runtime and not hardcoded.

    Do you get better results with qemu/kvm? Xen is more like 'hmm' than
    anything else.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Leiber@21:1/5 to All on Wed Jan 31 23:10:01 2024
    Am 31.01.2024 um 19:07 schrieb Andrew M.A. Cater:
    On Wed, Jan 31, 2024 at 08:02:47AM +0100, Paul Leiber wrote:
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:

    Paul

    [1]
    https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00796.html >>> [2] https://developer.arm.com/documentation/den0098/latest/
    [3] https://documentation-service.arm.com/static/628b755ce3c4322a76af56de?token=


    Hm, no reply so far. Is this maybe the wrong list? Should I post this rather >> somewhere else?

    Paul


    debian-arm / OFTC IRC #debian-arm or #debian-raspberrypi

    Xen in Debian is team maintained, I think, but many people have moved
    away from Xen in favour of other virtualisation/paravirtualisation
    solutions and containers.

    All the very best, as ever,

    Andy
    (amacater@debian.org)

    Thank you for the reply, Andy. I'll try my luck there.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tixy@21:1/5 to All on Wed Jan 31 23:20:02 2024
    On Wed, 2024-01-31 at 21:59 +0100, hw wrote:
    On Wed, 2024-01-31 at 08:02 +0100, Paul Leiber wrote:
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:
    [...]
    Some people on xen-devel pointed out to me two unhandled SMC calls in the boot logs which could be the root of the problem. I am now trying to find out where these calls come from to get closer to the root cause. The suspected calls are the following ones:

    (XEN) d0v0 Unhandled SMC/HVC: 0x84000050
    (XEN) d0v0 Unhandled SMC/HVC: 0x8600ff01

    These calls happen during the Dom0 boot process, so it's something from inside Linux and nothing Xen related, I've been told. The current working hypothesis is that the calls are trying to find some module not emulated by Xen and are therefore failing, leading to Linux waiting for the reply, and subsequently to the Xen watchdog triggering and rebooting.

    From what I could find out in ARM documentation, the unhandled SMC calls probably have the following purpose:

    0x84000050 = TRNG_VERSION, returns the implemented TRNG (True Random Number Generator) ABI version [2]
    0x8600ff01 = Call UID Query for Vendor Specific Hypervisor Service, Returns a unique identifier of the service provider [3]

    The more likely cause is the second call to the address 0x8600ff01.

    Now I simply have no idea how to find out where in the Linux boot process these calls are made. I tried poking into the Linux sources a bit, and I couldn't find an exact match for these call addresses, so I assume these addresses are assembled from different parts. There are some matches for "0x8600" and for "ff01", but I couldn't identify if these matches are relevant.

    I tried to find out if strace could help, but from what I understand, this is related to commands coming from userspace, so I am not sure that strace helps during the boot process.

    I'd appreciate it if somebody more knowledgeable would point me in the right direction. If more information is needed, I can provide it.

    I would search for the message 'Unhandled SMC/HVC' itself, or even for 'Unhandled', not for the address. The address is probably determined
    at runtime and not hardcoded.

    I sure those hex values aren't 'addresses' but the ID's for the secure
    monitor calls Paul already identified.

    Looking at the Linux sources I found the header for constructing these
    monitor calls: include/linux/arm-smccc.h

    So it might be worth looking at the files that include that. There are
    various drivers for firmware, and a watchdog driver amongst other
    things... drivers/watchdog/arm_smc_wdt.c

    --
    Tixy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Leiber@21:1/5 to All on Sat Feb 17 18:20:01 2024
    Am 31.01.2024 um 23:12 schrieb Tixy:
    On Wed, 2024-01-31 at 21:59 +0100, hw wrote:
    On Wed, 2024-01-31 at 08:02 +0100, Paul Leiber wrote:
    Am 25.01.2024 um 22:28 schrieb Paul Leiber:
    [...]
    Some people on xen-devel pointed out to me two unhandled SMC calls in
    the boot logs which could be the root of the problem. I am now trying to >>>> find out where these calls come from to get closer to the root cause.
    The suspected calls are the following ones:

    (XEN) d0v0 Unhandled SMC/HVC: 0x84000050
    (XEN) d0v0 Unhandled SMC/HVC: 0x8600ff01

    These calls happen during the Dom0 boot process, so it's something from >>>> inside Linux and nothing Xen related, I've been told. The current
    working hypothesis is that the calls are trying to find some module not >>>> emulated by Xen and are therefore failing, leading to Linux waiting for >>>> the reply, and subsequently to the Xen watchdog triggering and rebooting. >>>>
    From what I could find out in ARM documentation, the unhandled SMC
    calls probably have the following purpose:

    0x84000050 = TRNG_VERSION, returns the implemented TRNG (True Random
    Number Generator) ABI version [2]
    0x8600ff01 = Call UID Query for Vendor Specific Hypervisor Service,
    Returns a unique identifier of the service provider [3]

    The more likely cause is the second call to the address 0x8600ff01.

    Now I simply have no idea how to find out where in the Linux boot
    process these calls are made. I tried poking into the Linux sources a
    bit, and I couldn't find an exact match for these call addresses, so I >>>> assume these addresses are assembled from different parts. There are
    some matches for "0x8600" and for "ff01", but I couldn't identify if
    these matches are relevant.

    I tried to find out if strace could help, but from what I understand,
    this is related to commands coming from userspace, so I am not sure that >>>> strace helps during the boot process.

    I'd appreciate it if somebody more knowledgeable would point me in the >>>> right direction. If more information is needed, I can provide it.

    I would search for the message 'Unhandled SMC/HVC' itself, or even for
    'Unhandled', not for the address. The address is probably determined
    at runtime and not hardcoded.

    I sure those hex values aren't 'addresses' but the ID's for the secure monitor calls Paul already identified.

    Looking at the Linux sources I found the header for constructing these monitor calls: include/linux/arm-smccc.h

    So it might be worth looking at the files that include that. There are various drivers for firmware, and a watchdog driver amongst other
    things... drivers/watchdog/arm_smc_wdt.c


    That was spot on, I think. In include/linux/arm-smccc.h, the SMC calls
    are constructed, therefore it is not possible to find the IDs with a
    simple search in the sourcecode.

    (For completeness' sake: I also found out that Tianocore code is using
    the TRNG SMC call, but although Tianocore is being used for the boot
    process, I think that the linux code is more likely to be the cause of
    the above errors. [1])

    The first ID 0x84000050 is used for defining the constant ARM_SMCCC_TRNG_VERSION, the second ID 0x8600ff01 is used for the
    constant ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID.

    As suggested, I'll try to find relevant sourcecode that includes linux/arm-smccc.h.

    What's irritating me is that the whole problem only appears when having
    two VLANs and traffic on a VLAN. I assume this means that some code
    related to VLANs is relying on information from one of those calls and therefore fails when the call is not answered. Could that be plausible?

    Anyway, thank you for this information!


    [1] https://github.com/tianocore/edk2/blob/master/ArmPkg/Include/IndustryStandard/ArmStdSmc.h#L165

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)