• Re: rx2660 + debian

    From John Paul Adrian Glaubitz@21:1/5 to Anton Borisov on Sat Apr 9 09:20:01 2022
    Hello Anton!

    On 4/8/22 22:18, Anton Borisov wrote:
    I'm trying to kickstart Debian 11 (port that you build) on my rx2660. However, no luck at all because of memcpy bug in 5.x kernel. Could you
    please clarify what FW/BMC versions your server has?

    Here are the firmware versions of my RX2660:

    [mp0017a499dd1c] MP:CM> sr


    SYSREV

    Current firmware revisions

    MP FW : F.02.17
    BMC FW : 05.23
    EFI FW : ROM A 05.63, ROM B 07.12
    System FW : ROM A 01.00, ROM B 04.04, Boot ROM B
    UCIO FW : 03.0b
    PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01




    [mp0017a499dd1c] MP:CM>

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Sat Apr 9 10:10:01 2022
    Hi Adrian,

    On 09.04.22 09:09, John Paul Adrian Glaubitz wrote:
    Here are the firmware versions of my RX2660:

    [mp0017a499dd1c] MP:CM> sr


    SYSREV

    Current firmware revisions

    MP FW : F.02.17
    BMC FW : 05.23
    EFI FW : ROM A 05.63, ROM B 07.12
    System FW : ROM A 01.00, ROM B 04.04, Boot ROM B
    UCIO FW : 03.0b
    PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01




    [mp0017a499dd1c] MP:CM>

    Could you please also mention what processors you have installed,
    Montecito (90xy) or Montvale (91xy[z])? I'm not sure if this makes a difference, but suspect it could.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Sat Apr 9 10:50:01 2022
    Hi,

    On 09.04.22 10:25, John Paul Adrian Glaubitz wrote:
    Hello!

    On 4/9/22 10:06, Frank Scheiner wrote:
    Could you please also mention what processors you have installed,
    Montecito (90xy) or Montvale (91xy[z])? I'm not sure if this makes a
    difference, but suspect it could.

    glaubitz@electron:~$ cat /proc/cpuinfo |head -n18
    processor : 0
    vendor : GenuineIntel
    arch : IA-64
    family : 32
    model : 0
    model name : Dual-Core Intel(R) Itanium(R) Processor 9050
    revision : 5
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.671
    itc MHz : 399.166326
    BogoMIPS : 3182.59
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0
    glaubitz@electron:~$

    Thanks! I have two of those in one of my rx2660s and the other one has a
    single 9140M IIRC. But both are in (cold) storage right now, so no way
    to get the firmware levels in the near future - unless the spring
    returns shortly. ;-)

    But I will check my rx2620 for the firmware information and if it runs
    with the latest kernel. Procesors are Montecitos there (9020s IIRC).

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Frank Scheiner on Sat Apr 9 10:30:01 2022
    Hello!

    On 4/9/22 10:06, Frank Scheiner wrote:
    Could you please also mention what processors you have installed,
    Montecito (90xy) or Montvale (91xy[z])? I'm not sure if this makes a difference, but suspect it could.

    glaubitz@electron:~$ cat /proc/cpuinfo |head -n18
    processor : 0
    vendor : GenuineIntel
    arch : IA-64
    family : 32
    model : 0
    model name : Dual-Core Intel(R) Itanium(R) Processor 9050
    revision : 5
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.671
    itc MHz : 399.166326
    BogoMIPS : 3182.59
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0
    glaubitz@electron:~$

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Borisov@21:1/5 to frank.scheiner@web.de on Sat Apr 9 14:10:01 2022
    my info so far:

    [root@rx2660 ~]# cat /proc/cpuinfo |less

    vendor : GenuineIntel
    arch : IA-64
    family : Itanium 2
    model : 0
    revision : 7
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.000670
    itc MHz : 399.165948
    BogoMIPS : 3186.68
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0

    [root@rx2660 ~]# lspci
    00:01.0 Class ff00: Hewlett-Packard Company RMP-3 (Remote Management
    Processor)
    00:01.1 Communication controller: Hewlett-Packard Company RMP-3 Shared
    Memory Driver
    00:01.2 Serial controller: Hewlett-Packard Company Diva Serial [GSP]
    Multiport UART
    00:02.0 USB Controller: NEC Corporation USB (rev 43)
    00:02.1 USB Controller: NEC Corporation USB (rev 43)
    00:02.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
    00:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
    01:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
    01:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
    01:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
    02:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port
    04:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port
    05:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 03)
    07:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port
    08:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)
    08:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)

    SYSREV

    Current firmware revisions

    MP FW : F.02.26
    BMC FW : 05.26
    EFI FW : ROM A 05.65, ROM B 07.14
    System FW : ROM A 01.05, ROM B 04.30, Boot ROM B
    PDH FW : 50.07
    UCIO FW : 03.0b
    PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01

    P/n for this cpu: AH238-2100A (18mb L2 cache)

    On Sat, 9 Apr 2022 11:48 Frank Scheiner, <frank.scheiner@web.de> wrote:

    Hi,

    On 09.04.22 10:25, John Paul Adrian Glaubitz wrote:
    Hello!

    On 4/9/22 10:06, Frank Scheiner wrote:
    Could you please also mention what processors you have installed,
    Montecito (90xy) or Montvale (91xy[z])? I'm not sure if this makes a
    difference, but suspect it could.

    glaubitz@electron:~$ cat /proc/cpuinfo |head -n18
    processor : 0
    vendor : GenuineIntel
    arch : IA-64
    family : 32
    model : 0
    model name : Dual-Core Intel(R) Itanium(R) Processor 9050
    revision : 5
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.671
    itc MHz : 399.166326
    BogoMIPS : 3182.59
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0
    glaubitz@electron:~$

    Thanks! I have two of those in one of my rx2660s and the other one has a single 9140M IIRC. But both are in (cold) storage right now, so no way
    to get the firmware levels in the near future - unless the spring
    returns shortly. ;-)

    But I will check my rx2620 for the firmware information and if it runs
    with the latest kernel. Procesors are Montecitos there (9020s IIRC).

    Cheers,
    Frank


    <div dir="auto">my info so far:<div dir="auto"><br></div><div dir="auto"><div dir="auto">[root@rx2660 ~]# cat /proc/cpuinfo |less</div><div dir="auto"><br></div><div dir="auto">vendor     : GenuineIntel</div><div dir="auto">arch       : IA-64</div><
    div dir="auto">family     : Itanium 2</div><div dir="auto">model      : 0</div><div dir="auto">revision   : 7</div><div dir="auto">archrev    : 0</div><div dir="auto">features   : branchlong, 16-byte atomic ops</div><div dir="auto">cpu number :
    0</div><div dir="auto">cpu regs   : 4</div><div dir="auto">cpu MHz    : 1594.000670</div><div dir="auto">itc MHz    : 399.165948</div><div dir="auto">BogoMIPS   : 3186.68</div><div dir="auto">siblings   : 4</div><div dir="auto">physical id: 0</div>
    <div dir="auto">core id    : 0</div><div dir="auto">thread id  : 0</div><div dir="auto"><br></div><div dir="auto">[root@rx2660 ~]# lspci</div><div dir="auto">00:01.0 Class ff00: Hewlett-Packard Company RMP-3 (Remote Management Processor)</div><div dir=
    "auto">00:01.1 Communication controller: Hewlett-Packard Company RMP-3 Shared Memory Driver</div><div dir="auto">00:01.2 Serial controller: Hewlett-Packard Company Diva Serial [GSP] Multiport UART</div><div dir="auto">00:02.0 USB Controller: NEC
    Corporation USB (rev 43)</div><div dir="auto">00:02.1 USB Controller: NEC Corporation USB (rev 43)</div><div dir="auto">00:02.2 USB Controller: NEC Corporation USB 2.0 (rev 04)</div><div dir="auto">00:03.0 VGA compatible controller: ATI Technologies Inc
    ES1000 (rev 02)</div><div dir="auto">01:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)</div><div dir="auto">01:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)</div><
    div dir="auto">01:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)</div><div dir="auto">02:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port</div><div dir="auto">04:00.0 PCI bridge: Hewlett-Packard Company
    PCIe Root Port</div><div dir="auto">05:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 03)</div><div dir="auto">07:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port</div><div dir="auto">08:00.0 Fibre Channel: QLogic
    Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)</div><div dir="auto">08:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02)</div><div dir="auto"><br></div><div dir="auto">SYSREV</div><div dir="
    auto"><br></div><div dir="auto">Current firmware revisions</div><div dir="auto"><br></div><div dir="auto"> MP FW     : F.02.26</div><div dir="auto"> BMC FW    : 05.26</div><div dir="auto"> EFI FW    : ROM A 05.65, ROM B 07.14</div><div dir="auto"
     System FW : ROM A 01.05, ROM B 04.30, Boot ROM B</div><div dir="auto"> PDH FW    : 50.07</div><div dir="auto"> UCIO FW   : 03.0b</div><div dir="auto"> PRS FW    : 00.08 UpSeqRev: 02, DownSeqRev: 01</div><div dir="auto"><br></div><div dir="auto"
    P/n for this cpu: AH238-2100A (18mb L2 cache)</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 9 Apr 2022 11:48 Frank Scheiner, &lt;<a href="mailto:frank.scheiner@web.de">frank.scheiner@web.de</a>&gt; wrote:<br></
    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

    On 09.04.22 10:25, John Paul Adrian Glaubitz wrote:<br>
    &gt; Hello!<br>
    &gt;<br>
    &gt; On 4/9/22 10:06, Frank Scheiner wrote:<br>
    &gt;&gt; Could you please also mention what processors you have installed,<br> &gt;&gt; Montecito (90xy) or Montvale (91xy[z])? I&#39;m not sure if this makes a<br>
    &gt;&gt; difference, but suspect it could.<br>
    &gt;<br>
    &gt; glaubitz@electron:~$ cat /proc/cpuinfo |head -n18<br>
    &gt; processor  : 0<br>
    &gt; vendor     : GenuineIntel<br>
    &gt; arch       : IA-64<br>
    &gt; family     : 32<br>
    &gt; model      : 0<br>
    &gt; model name : Dual-Core Intel(R) Itanium(R) Processor 9050<br>
    &gt; revision   : 5<br>
    &gt; archrev    : 0<br>
    &gt; features   : branchlong, 16-byte atomic ops<br>
    &gt; cpu number : 0<br>
    &gt; cpu regs   : 4<br>
    &gt; cpu MHz    : 1594.671<br>
    &gt; itc MHz    : 399.166326<br>
    &gt; BogoMIPS   : 3182.59<br>
    &gt; siblings   : 4<br>
    &gt; physical id: 0<br>
    &gt; core id    : 0<br>
    &gt; thread id  : 0<br>
    &gt; glaubitz@electron:~$<br>

    Thanks! I have two of those in one of my rx2660s and the other one has a<br> single 9140M IIRC. But both are in (cold) storage right now, so no way<br>
    to get the firmware levels in the near future - unless the spring<br>
    returns shortly. ;-)<br>

    But I will check my rx2620 for the firmware information and if it runs<br>
    with the latest kernel. Procesors are Montecitos there (9020s IIRC).<br>

    Cheers,<br>
    Frank<br>
    </blockquote></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anton Borisov on Sat Apr 9 18:30:01 2022
    On 09.04.22 13:45, Anton Borisov wrote:
    my info so far:

    [root@rx2660 ~]# cat /proc/cpuinfo |less

    vendor     : GenuineIntel
    arch       : IA-64
    family     : Itanium 2
    model      : 0
    revision   : 7
    archrev    : 0
    features   : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs   : 4
    cpu MHz    : 1594.000670
    itc MHz    : 399.165948
    BogoMIPS   : 3186.68
    siblings   : 4
    physical id: 0
    core id    : 0
    thread id  : 0

    Hm, this doesn't contain the model name for some reason.

    What kernel are you running there currently?

    And what 5.x kernel specifically doesn't work correctly for you?

    P/n for this cpu: AH238-2100A (18mb L2 cache)

    According to its frequency readout it could be a 9040 (Montecito) or a
    9140N (Montvale) (see [1] for details).

    [1]: https://www.cpu-world.com/CPUs/Itanium_2/index.html

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anton Borisov on Sat Apr 9 21:00:02 2022
    On 09.04.22 20:43, Anton Borisov wrote:
    cpuinfo was from RHEL 5.4 environment. IIRC it had 2.6.18 kernel, that's
    why such a tight model name. I'll give more details on my next boot with
    a fresh snapshot...

    That explains it.

    None of the kernels from 5.x tree worked for me. Everytime I hit that
    memcpy BUG. I've booted off from https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/ <https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/> - it
    was perfect in terms of initial boot and launching the install shell.
    But then it stuck at "detect and mount CD" (step 3 of launcher).

    IIRC the kernel on the installer ISOs is a generic one. Maybe that makes
    a difference. In any case, if you can get the processor info that would
    be good to know.

    UPDATE: Just noticed, you seem to use an old snapshot. Maybe better use
    a current one.

    For reference, I just upgraded the FS of my rx2620 with two Montecitos
    using a 5.10.0-8 kernel and am now doing some benchmarking with a
    5.16.0-6 kernel. So far no problems, nor instabilities.

    For reference:

    Firmware levels of this machine:

    ```
    [rx2620-mp-ilo] MP:CM> sr


    SYSREV

    Current firmware revisions

    MP FW : E.03.32
    BMC FW : 04.04
    EFI FW : 05.48
    System FW : 04.29
    ```

    Processors are 2 x Itanium 2 9020 (Montecito).

    But I have to say, this machine seems to ve very well supported, because
    it ran everything I threw at it, so far.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Borisov@21:1/5 to frank.scheiner@web.de on Sat Apr 9 21:10:01 2022
    cpuinfo was from RHEL 5.4 environment. IIRC it had 2.6.18 kernel, that's
    why such a tight model name. I'll give more details on my next boot with a fresh snapshot...

    None of the kernels from 5.x tree worked for me. Everytime I hit that
    memcpy BUG. I've booted off from https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/ - it was
    perfect in terms of initial boot and launching the install shell. But then
    it stuck at "detect and mount CD" (step 3 of launcher).

    On Sat, Apr 9, 2022 at 7:25 PM Frank Scheiner <frank.scheiner@web.de> wrote:

    On 09.04.22 13:45, Anton Borisov wrote:
    my info so far:

    [root@rx2660 ~]# cat /proc/cpuinfo |less

    vendor : GenuineIntel
    arch : IA-64
    family : Itanium 2
    model : 0
    revision : 7
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.000670
    itc MHz : 399.165948
    BogoMIPS : 3186.68
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0

    Hm, this doesn't contain the model name for some reason.

    What kernel are you running there currently?

    And what 5.x kernel specifically doesn't work correctly for you?

    P/n for this cpu: AH238-2100A (18mb L2 cache)

    According to its frequency readout it could be a 9040 (Montecito) or a
    9140N (Montvale) (see [1] for details).

    [1]: https://www.cpu-world.com/CPUs/Itanium_2/index.html

    Cheers,
    Frank


    <div dir="ltr"><div>cpuinfo was from RHEL 5.4 environment. IIRC it had 2.6.18 kernel, that&#39;s why such a tight model name. I&#39;ll give more details on my next boot with a fresh snapshot...<br></div><div><br></div><div>None of the kernels from 5.x
    tree worked for me. Everytime I hit that memcpy BUG. I&#39;ve booted off from <a href="https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/">https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/</a> - it was perfect in terms of initial
    boot and launching the install shell. But then it stuck at &quot;detect and mount CD&quot; (step 3 of launcher).<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Apr 9, 2022 at 7:25 PM Frank Scheiner &lt;<a href="
    mailto:frank.scheiner@web.de">frank.scheiner@web.de</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 09.04.22 13:45, Anton Borisov wrote:<br>
    &gt; my info so far:<br>
    &gt;<br>
    &gt; [root@rx2660 ~]# cat /proc/cpuinfo |less<br>
    &gt;<br>
    &gt; vendor     : GenuineIntel<br>
    &gt; arch       : IA-64<br>
    &gt; family     : Itanium 2<br>
    &gt; model      : 0<br>
    &gt; revision   : 7<br>
    &gt; archrev    : 0<br>
    &gt; features   : branchlong, 16-byte atomic ops<br>
    &gt; cpu number : 0<br>
    &gt; cpu regs   : 4<br>
    &gt; cpu MHz    : 1594.000670<br>
    &gt; itc MHz    : 399.165948<br>
    &gt; BogoMIPS   : 3186.68<br>
    &gt; siblings   : 4<br>
    &gt; physical id: 0<br>
    &gt; core id    : 0<br>
    &gt; thread id  : 0<br>

    Hm, this doesn&#39;t contain the model name for some reason.<br>

    What kernel are you running there currently?<br>

    And what 5.x kernel specifically doesn&#39;t work correctly for you?<br>

    &gt; P/n for this cpu: AH238-2100A (18mb L2 cache)<br>

    According to its frequency readout it could be a 9040 (Montecito) or a<br> 9140N (Montvale) (see [1] for details).<br>

    [1]: <a href="https://www.cpu-world.com/CPUs/Itanium_2/index.html" rel="noreferrer" target="_blank">https://www.cpu-world.com/CPUs/Itanium_2/index.html</a><br>

    Cheers,<br>
    Frank<br>
    </blockquote></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Borisov@21:1/5 to All on Sat Apr 9 22:10:01 2022
    It is 9040 (from installer's shell):

    arch : IA-64
    family : 32
    model : 0
    model name : Dual-Core Intel(R) Itanium(R) Processor 9040
    revision : 7
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.666
    itc MHz : 399.164976
    BogoMIPS : 3182.59
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0


    UPDATE: Just noticed, you seem to use an old snapshot. Maybe better use
    a current one.


    Frank, I tried almost every snapshop from 2021. The only usable one is from debian 10th branch (that's 2019-07-16 :).

    The fresh one, i.e. current, dated as 2022-03-28, generates BUG and kernel stack error.

    <div dir="ltr"><div dir="ltr"><div>It is 9040 (from installer&#39;s shell):<br></div><br><div>arch       : IA-64</div>family     : 32<br>model      : 0<br>model name : Dual-Core Intel(R) Itanium(R) Processor 9040<br>revision   : 7<br>archrev  
    : 0<br>features   : branchlong, 16-byte atomic ops<br>cpu number : 0<br>cpu regs   : 4<br>cpu MHz    : 1594.666<br>itc MHz    : 399.164976<br>BogoMIPS   : 3182.59<br>siblings   : 4<br>physical id: 0<br>core id    : 0<br>thread id  : 0<br></
    <br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
    UPDATE: Just noticed, you seem to use an old snapshot. Maybe better use<br>
    a current one.<br>
    <br></blockquote><div><br></div><div>Frank, I tried almost every snapshop from 2021. The only usable one is from debian 10th branch (that&#39;s 2019-07-16 :).<br></div><div><br></div><div>The fresh one, i.e. current, dated as 2022-03-28, generates BUG
    and kernel stack error.<br></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Frank Scheiner on Sat Apr 9 22:10:01 2022
    On 09.04.22 20:53, Frank Scheiner wrote:
    On 09.04.22 20:43, Anton Borisov wrote:
    None of the kernels from 5.x tree worked for me. Everytime I hit that
    memcpy BUG. I've booted off from
    https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/
    <https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/> - it
    was perfect in terms of initial boot and launching the install shell.
    But then it stuck at "detect and mount CD" (step 3 of launcher).

    IIRC the kernel on the installer ISOs is a generic one. Maybe that makes
    a difference.

    Just installed the generic one (`-itanium` instead of `-mckinley`) and
    did sum benchmarking and didn't notice any issues. So the kernel type
    seems to make no difference - at least not for my rx2620.

    ****

    On 09.04.22 21:52, Anton Borisov wrote:
    It is 9040 (from installer's shell):

    arch : IA-64
    family : 32
    model : 0
    model name : Dual-Core Intel(R) Itanium(R) Processor 9040
    revision : 7
    archrev : 0
    features : branchlong, 16-byte atomic ops
    cpu number : 0
    cpu regs : 4
    cpu MHz : 1594.666
    itc MHz : 399.164976
    BogoMIPS : 3182.59
    siblings : 4
    physical id: 0
    core id : 0
    thread id : 0

    Ok, so a Montecito, too.



    UPDATE: Just noticed, you seem to use an old snapshot. Maybe
    better use
    a current one.


    Frank, I tried almost every snapshop from 2021. The only usable one is
    from debian 10th branch (that's 2019-07-16 :).

    The fresh one, i.e. current, dated as 2022-03-28, generates BUG and
    kernel stack error.

    Ok, I understand now, thanks for clarification. Strange situation. I'll
    have a look at my rx2660s as soon as time and temperature allows.

    If installing from the 2019-07-16 snapshot doesn't work through, you
    could try to netboot your rx2660 from the netboot installer (current
    version on [1], contains a tarball with the files for netbooting the installer), but I couldn't find a version from the snapshot date above
    on snapshot.debian.org. So no luck with that. :-/

    [1]: http://ftp.ports.debian.org/debian-ports/pool-ia64/main/d/debian-installer/debian-installer-images_20210731_ia64.tar.gz

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Anton Borisov on Sat Apr 9 22:10:01 2022
    Hello Anton!

    On 4/9/22 20:43, Anton Borisov wrote:
    cpuinfo was from RHEL 5.4 environment. IIRC it had 2.6.18 kernel, that's
    why such a tight model name. I'll give more details on my next boot with
    a fresh snapshot...

    None of the kernels from 5.x tree worked for me. Everytime I hit that memcpy BUG. I've booted off from https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/
    - it was perfect in terms of initial boot and launching the install shell. But then it
    stuck at "detect and mount CD" (step 3 of launcher).

    Please make sure you are installing the latest possible firmware versions for both the
    system and the RAID controller. The Itanium machines can show various issues with Linux
    when the firmware is too old.

    My RX2660 wouldn't boot with SMP with newer kernels, for example, until I upgraded the
    firmware of the RAID controller to the latest version. Before that, I always had to
    pass "nosmp" or the RAID controller wouldn't get detected.

    You should also make sure you don't have any bad memory by running a program for testing
    the main memory such as memtest86.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Pedro Miguel Justo on Wed Apr 20 08:50:01 2022
    Hello Pedro!

    On 4/20/22 08:39, Pedro Miguel Justo wrote:
    One question (a rather basic one):

    What is the right configuration for ‘atp’ on an machine with ia64 debian ports?

    When I run ‘atp update’ I get the following error:

    Get:1 http://ftp.ports.debian.org/debian-ports sid InRelease [24.2 kB]
    Err:1 http://ftp.ports.debian.org/debian-ports sid InRelease
    The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 5A88D659DCB811BB
    Reading package lists... Done
    W: GPG error: http://ftp.ports.debian.org/debian-ports sid InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 5A88D659DCB811BB
    E: The repository 'http://ftp.ports.debian.org/debian-ports sid InRelease' is not signed.
    N: Updating from such a repository can't be done securely, and is therefore disabled by default.
    N: See apt-secure(8) manpage for repository creation and user configuration details.

    You are just missing the package debian-ports-archive-keyring:

    # wget http://ftp.ports.debian.org/debian-ports/pool/main/d/debian-ports-archive-keyring/debian-ports-archive-keyring_2022.02.15_all.deb
    # dpkg -i debian-ports-archive-keyring_2022.02.15_all.deb
    # apt update

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Pedro Miguel Justo on Wed Apr 20 09:40:01 2022
    Hello Pedro!

    On 4/20/22 09:07, Pedro Miguel Justo wrote:
    Thanks Adrian.

    That worked quite well. I was able to refresh the package list with no error after that.

    However, when conducting the “upgrade” command, things quickly derailed. I am now in a state where not even “perl” runs.


    # perl
    perl: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or directory


    I guess I had just been too long since I had taken an upgrade and things are fragile as it is, without letting things
    lapse this long. I guess the best next step for me is to install the OS clean from the most recent ISO.

    This is a bug in the glibc library [1] that was fixed long ago.

    It's basically possible to repair your setup manually but it's just easier to reinstall
    using the latest installation ISO.

    Adrian

    [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974552

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Pedro Miguel Justo on Mon Apr 25 07:50:01 2022
    Hi!

    On 4/25/22 00:01, Pedro Miguel Justo wrote:
    So, I went ahead and tried the ISO from your last email: The 2022-03-18 (non-free).
    Things didn’t go too well. I am back at having usercopy problems again. Did we have a workaround for that?

    [ 1.478621] usercopy: Kernel memory overwrite attempt detected to linear kernel text (offset 15466496, size 3)!
    [ 1.480383] kernel BUG at mm/usercopy.c:99!
    [ 1.480383] cryptomgr_test[76]: bugcheck! 0 [1]
    [ 1.484383] Modules linked in:
    [ 1.484383]
    [ 1.484383] CPU: 3 PID: 76 Comm: cryptomgr_test Not tainted 5.16.0-5-itanium #1 Debian 5.16.14-1
    [ 1.484383] Hardware name: hp server rx2660 , BIOS 04.30 03/05/2012
    [ 1.484383] psr : 00001010084a6010 ifs : 8000000000000410 ip : [<a0000001013389b0>] Not tainted (5.16.0-5-itanium Debian 5.16.14-1)
    [ 1.484383] ip is at usercopy_abort+0x120/0x130
    [ 1.484383] unat: 0000000000000000 pfs : 0000000000000410 rsc : 0000000000000003
    [ 1.484383] rnat: a000000101929380 bsps: 00000000000000ff pr : 00000005666a9655
    [ 1.484383] ldrs: 0000000000000000 ccv : 00000000fffff13f fpsr: 0009804c8a70433f
    [ 1.484383] csd : 0000000000000000 ssd : 0000000000000000
    [ 1.484383] b0 : a0000001013389b0 b6 : a000000100cbd7c0 b7 : a000000100813460
    [ 1.484383] f6 : 1003e00000000002c1e6e f7 : 1003e0044b82fa09b5a53
    [ 1.484383] f8 : 1003e0000000000000bd7 f9 : 1003e000000000394424f
    [ 1.484383] f10 : 1003e20c49ba5e353f7cf f11 : 1003e00000000007547f9
    [ 1.484383] r1 : a000000101c1cd70 r2 : a0000001019aa680 r3 : a0000001019aa688
    [ 1.484383] r8 : 000000000000001f r9 : a000000101992628 r10 : c0000000ffffefff
    [ 1.484383] r11 : 0000000000000003 r12 : e000000101027c70 r13 : e000000101020000
    [ 1.484383] r14 : ffffffffffd8d910 r15 : a0000001019aa688 r16 : 00000000ffffefff
    [ 1.484383] r17 : 0000000000000001 r18 : e000000101027ba0 r19 : 0000000000000140
    [ 1.484383] r20 : 000000000000000f r21 : 0000000000000003 r22 : 0000000000000000
    [ 1.484383] r23 : 0000000000000003 r24 : 0000000000000000 r25 : ffffffffffd0c6d1
    [ 1.484383] r26 : 000000000000000c r27 : a000000101992680 r28 : 0000000000001000
    [ 1.484383] r29 : 0000000000000fff r30 : 0000000000000fff r31 : 0000000000001ffe
    [ 1.484383]
    [ 1.484383] Call Trace:
    [ 1.484383] [<a000000100014c50>] show_stack+0x90/0xc0
    [ 1.484383] sp=e0000001010278b0 bsp=e000000101021628
    [ 1.484383] [<a000000100015360>] show_regs+0x6e0/0xa40
    [ 1.484383] sp=e000000101027a80 bsp=e0000001010215b0
    [ 1.484383] [<a000000100026bb0>] die+0x150/0x4c0
    [ 1.484383] sp=e000000101027aa0 bsp=e000000101021568
    [ 1.484383] [<a000000101366d40>] ia64_bad_break+0x740/0x760
    [ 1.484383] sp=e000000101027aa0 bsp=e000000101021538
    [ 1.484383] [<a00000010000ca80>] ia64_leave_kernel+0x0/0x270
    [ 1.484383] sp=e000000101027aa0 bsp=e000000101021538
    [ 1.484383] [<a0000001013389b0>] usercopy_abort+0x120/0x130
    [ 1.484383] sp=e000000101027c70 bsp=e0000001010214b8
    [ 1.484383] [<a0000001004b83f0>] __check_object_size+0x3f0/0x460
    [ 1.484383] sp=e000000101027c80 bsp=e000000101021480
    [ 1.484383] [<a00000010081f3e0>] build_test_sglist+0x540/0x8c0
    [ 1.484383] sp=e000000101027c80 bsp=e0000001010213b8
    [ 1.484383] [<a00000010081fac0>] test_shash_vec_cfg+0x1e0/0xc80
    [ 1.484383] sp=e000000101027d00 bsp=e000000101021308
    [ 1.484383] [<a000000100829810>] __alg_test_hash.constprop.0+0x2f0/0x760
    [ 1.484383] sp=e000000101027da0 bsp=e000000101021260
    [ 1.484383] [<a000000100829d90>] alg_test_hash+0x110/0x2e0
    [ 1.484383] sp=e000000101027db0 bsp=e000000101021208
    [ 1.484383] [<a000000100825a10>] alg_test+0xc50/0xec0
    [ 1.484383] sp=e000000101027db0 bsp=e000000101021180
    [ 1.484383] [<a00000010081d240>] cryptomgr_test+0x80/0xc0
    [ 1.484383] sp=e000000101027e30 bsp=e000000101021160
    [ 1.484383] [<a0000001000c08e0>] kthread+0x2e0/0x300
    [ 1.484383] sp=e000000101027e30 bsp=e000000101021118
    [ 1.484383] [<a00000010000c870>] call_payload+0x50/0x80
    [ 1.484383] sp=e000000101027e30 bsp=e000000101021100
    [ 1.484383] Disabling lock debugging due to kernel taint
    [ 2.127275] Freeing initrd memory: 21920kB freed
    [ 6.655281] random: crng init done

    I also see there are a couple more recent ISOs. Should I try those first?

    Same exact failure using the 2022-03-28 ISO. And it happens even with “hardened_usercopy=off”.

    I think Sergei Trofimovich had plans to fix this bug but I'm not sure how far that has progressed.

    It might also make sense trying to update the system firmware to the latest version you can get.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Pedro Miguel Justo on Mon Apr 25 10:10:01 2022
    Hello!

    On 4/25/22 08:01, Pedro Miguel Justo wrote:
    It might also make sense trying to update the system firmware to the latest version you can get

    If I am not mistaken, last time we checked, my rx2660 FW version was actually more recent than yours…

    I think I posted my firmware versions earlier in this thread. Did you compare them?

    From what I can understand by the information in the bugcheck, this is somewhat related to a violation
    in parameter copy from user to kernel during some boot-time, crypto, self-test. Does that sound right?
    If that is the case, how would this be related to FW?

    I'm not claiming that it must be related to the firmware, I'm just saying that I don't see this problem
    on my RX2660 at all and I have even reinstalled it recently with one of the latest firmware images
    without having to pass any parameter to the command line.

    Maybe Sergei can comment on the usercopy issue, I forgot the details.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Mon Apr 25 10:20:01 2022
    Hi guys,

    On 25.04.22 10:09, John Paul Adrian Glaubitz wrote:
    From what I can understand by the information in the bugcheck, this is somewhat related to a violation
    in parameter copy from user to kernel during some boot-time, crypto, self-test. Does that sound right?
    If that is the case, how would this be related to FW?

    I'm not claiming that it must be related to the firmware, I'm just saying that I don't see this problem
    on my RX2660 at all and I have even reinstalled it recently with one of the latest firmware images
    without having to pass any parameter to the command line.

    A difference between Adrian's rx2660 and Pedro's rx2660 is Montecito
    left and Montvale right.

    But could still be multiple other reasons we haven't looked at yet in
    detail:

    * amount of memory installed
    * SMT enabled or not
    * number of processor modules installed

    It might be possible for me to check on my rx2660s (one with Montvale
    and one with Montecito(s)) tomorrow. I will then also look at my other
    Itanium gear and gather relevant information.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sergei Trofimovich@21:1/5 to Pedro Miguel Justo on Mon Apr 25 23:30:01 2022
    On Mon, 25 Apr 2022 15:07:58 +0000
    Pedro Miguel Justo <pmsjt@texair.net> wrote:

    On 2022/Apr/25, at 01:22, Pedro Miguel Justo <pmsjt@texair.net> wrote:



    On 2022/Apr/25, at 01:14, Frank Scheiner <frank.scheiner@web.de> wrote:

    Hi guys,

    On 25.04.22 10:09, John Paul Adrian Glaubitz wrote:
    From what I can understand by the information in the bugcheck, this is somewhat related to a violation
    in parameter copy from user to kernel during some boot-time, crypto, self-test. Does that sound right?
    If that is the case, how would this be related to FW?

    I'm not claiming that it must be related to the firmware, I'm just saying that I don't see this problem
    on my RX2660 at all and I have even reinstalled it recently with one of the latest firmware images
    without having to pass any parameter to the command line.

    A difference between Adrian's rx2660 and Pedro's rx2660 is Montecito
    left and Montvale right.

    But could still be multiple other reasons we haven't looked at yet in
    detail:

    * amount of memory installed
    * SMT enabled or not
    * number of processor modules installed

    It might be possible for me to check on my rx2660s (one with Montvale
    and one with Montecito(s)) tomorrow. I will then also look at my other
    Itanium gear and gather relevant information.


    Yes, this sounds mode likely to me too.

    The crypto self-tests seem to be an innocent bystander here. I tried booting the most recent kernel with the option “cryptomgr.notests” and it went much farther. Alas it still failed with another buffer copy validation for a different caller
    altogether:

    [ 3.836466] [<a000000101353690>] usercopy_abort+0x120/0x130
    [ 3.836466] sp=e0000001000cfdf0 bsp=e0000001000c9388
    [ 3.836466] [<a0000001004c5660>] __check_object_size+0x3c0/0x420
    [ 3.836466] sp=e0000001000cfe00 bsp=e0000001000c9350
    [ 3.836466] [<a000000100570030>] sys_getcwd+0x250/0x420
    [ 3.836466] sp=e0000001000cfe00 bsp=e0000001000c92c8
    [ 3.836466] [<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20
    [ 3.836466] sp=e0000001000cfe30 bsp=e0000001000c92c8
    [ 3.836466] [<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400
    [ 3.836466] sp=e0000001000d0000 bsp=e0000001000c92c8

    This suggests the bug might be in the logic validating these buffers against the allocations (heap, span, etc).

    I don’t know why hardened_usercopy=off is not being observed by the kernel. As a work-around I am copying myself a new kernel with CONFIG_HARDENED_USERCOPY disabled at the source.


    Even with kernel "Linux debian 4.19.0-5-mckinley #1 SMP Debian 4.19.37-5 (2019-06-19) ia64 GNU/Linux"

    Things are still not 100%. After a few hours into building the kernel it started crashing also with usercopy validations but, this time, the other way around. And because it was the other way around, it led to process termination instead of full-blown
    bugcheck. This could be related or not. Coule very well be a different bug that happens to manifest itself round the same validation.

    CC [M] drivers/net/wireless/realtek/rtw88/rtw8822be.o
    LD [M] drivers/net/wireless/realtek/rtw88/rtw88_8822be.o
    CC [M] drivers/net/wireless/realtek/rtw88/rtw8822c.o
    Segmentation fault
    make[5]: *** [scripts/Makefile.build:293: drivers/net/wireless/realtek/rtw88/rtw8822c.o] Error 139
    make[5]: *** Deleting file 'drivers/net/wireless/realtek/rtw88/rtw8822c.o' make[4]: *** [scripts/Makefile.build:555: drivers/net/wireless/realtek/rtw88] Error 2
    make[3]: *** [scripts/Makefile.build:555: drivers/net/wireless/realtek] Error 2
    make[2]: *** [scripts/Makefile.build:555: drivers/net/wireless] Error 2 make[1]: *** [scripts/Makefile.build:555: drivers/net] Error 2
    make: *** [Makefile:1855: drivers] Error 2
    pmsjt@debian:~/linux-source-5.17$ make

    Message from syslogd@debian at Apr 25 07:58:08 ...
    kernel:[23420.984012] usercopy: Kernel memory overwrite attempt detected to linear kernel text (offset 1916912, size 8)!

    Message from syslogd@debian at Apr 25 07:58:08 ...
    kernel:[23421.268009] usercopy: Kernel memory overwrite attempt detected to linear kernel text (offset 1818608, size 8)!
    HOSTCC scripts/sign-file
    CALL scripts/checksyscalls.sh
    <stdin>:1517:2: warning: #warning syscall clone3 not implemented [-Wcpp]
    CALL scripts/atomic/check-atomics.sh
    CHK include/generated/compile.h
    make[2]: *** [scripts/Makefile.build:294: arch/ia64/kernel/signal.o] Segmentation fault

    Message from syslogd@debian at Apr 25 07:58:11 ...
    kernel:[23423.626254] usercopy: Kernel memory overwrite attempt detected to linear kernel text (offset 1933296, size 8)!
    make[1]: *** [scripts/Makefile.build:555: arch/ia64/kernel] Error 2
    make: *** [Makefile:1855: arch/ia64] Error 2

    In my understanding hardened_usercopy=on is completely broken on ia64
    today. It can't run any userspace. Even init process would not survive
    machine boot. At least that's what I experienced on rx3600.

    Thus I think if your system survives that much time I would guess
    that you have hardened_usercopy=off in full effect at least at boot.

    I would speculate it's some kind of memory corruption around 'bypass_usercopy_checks' key.

    Worth adding a few printk()s to mm/usercopy.c into 'usercopy_abort()'
    and into 'set_hardened_usercopy()' just to make sure 'bypass_usercopy_checks' has expected 'true' setting at boot time and at crash time.

    --

    Sergei

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Pedro Miguel Justo on Tue Apr 26 09:30:01 2022
    Hi!

    On 4/26/22 04:43, Pedro Miguel Justo wrote:
    So, I finished compiling my kernel with CONFIG_HARDENED_USERCOPY disabled. Guess what:

    pmsjt@debian:~$ uname -a
    Linux debian 5.17.3-rt17 #2 SMP Mon Apr 25 16:55:00 PDT 2022 ia64 GNU/Linux

    Yup, the system starts just fine with the most recent kernel. So, two things we can infer from this:
    - Yes, usercopy validation appears to be broken. The contours of how broken it is are yet unknown
    but we’ll have to investigate to see what part of the validation is failing.
    - hardened_usercopy=off seems to be ignored by current kernels. When passing this option the system
    was still failing just the same.

    We can certainly send a pull request to the Debian kernel packaging repository to disable
    CONFIG_HARDENED_USERCOPY although I'm not sure what ramifications that would have.

    But since the feature is broken on Itanium anyway, I guess it won't hurt.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to All on Tue Apr 26 15:40:02 2022
    Hi Pedro, Anton, all

    so I did some first testing on my Montecito driven rx2660:

    firmware info:
    ```
    [rx2660-mp-ilo] MP:CM> sysrev


    SYSREV

    Current firmware revisions

    MP FW : F.02.17
    BMC FW : 05.23
    EFI FW : ROM A 07.12, ROM B 07.12
    System FW : ROM A 04.04, ROM B 04.04, Boot ROM A
    UCIO FW : 03.0b
    PRS FW : 00.08 UpSeqRev: 02, DownSeqRev: 01
    ```

    hardware info:
    ```
    root@rx2660:~# uname -a
    Linux rx2660 4.19.0-5-mckinley #1 SMP Debian 4.19.37-5 (2019-06-19) ia64 GNU/Linux

    root@rx2660:~# lscpu
    Architecture: ia64
    CPU op-mode(s): 64-bit
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Thread(s) per core: 2
    Core(s) per socket: 2
    Socket(s): 2
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 32
    Model: 7
    Model name: Dual-Core Intel(R) Itanium(R) Processor 9050
    CPU MHz: 1594.639
    BogoMIPS: 3182.59
    L1d cache: 16K
    L1i cache: 16K
    L2d cache: 256K
    L2i cache: 1024K
    L3 cache: 12288K
    NUMA node0 CPU(s): 0-7
    Flags: branchlong, 16-byte atomic ops

    ## 8 CPUs (or better hardware threads) => SMT enabled!

    root@rx2660:~# free -m
    total used free shared buff/cache
    available
    Mem: 32574 394 31054 17 1125
    31869
    Swap: 0 0 0
    ```

    ...and after successfully upgrading my root FS (last touched in 2019!)
    with 4.19.0-5-mckinley w/o a problem, on first boot with
    5.17.0-1-mckinley I also get those usercopy related problem(s), despite
    having two Montecitos installed:

    ```
    Booting `Debian GNU/Linux Sid (diskless)'

    Loading Linux kernel ...
    Loading initial ramdisk ...
    [ 0.000000] Linux version 5.17.0-1-mckinley
    (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-20) 11.2.0, GNU
    ld (GNU Binutils for Debian) 2.38) #1 SMP Debian 5.17.3-1 (2022-04-18)
    [ 0.000000] efi: EFI v2.00 by HP
    [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fde6000
    ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fde4000
    [ 0.000000] PCDP: v3 at 0x3fde4000
    [...]
    [ 1.199313] zbud: loaded
    [ 1.199313] integrity: Platform Keyring initialized
    [ 1.199313] Key type asymmetric registered
    [ 1.199313] Asymmetric key parser 'x509' registered
    [ 1.927433] Freeing initrd memory: 26688kB freed
    [ 1.930079] usercopy: Kernel memory overwrite attempt detected to
    linear kernel text (offset 450555, size 4)!
    [ 1.930079] kernel BUG at mm/usercopy.c:100!
    [ 1.930079] kworker/u16:1[71]: bugcheck! 0 [1]
    [ 1.930079] Modules linked in:
    [ 1.930079]
    [ 1.930079] CPU: 3 PID: 71 Comm: kworker/u16:1 Not tainted
    5.17.0-1-mckinley #1 Debian 5.17.3-1
    [ 1.930079] Hardware name: hp server rx2660 , BIOS
    04.04 07/15/2008
    [ 1.930079] psr : 00001010084a6010 ifs : 8000000000000410 ip : [<a000000101353690>] Not tainted (5.17.0-1-mckinley Debian 5.17.3-1)
    [ 1.930079] ip is at usercopy_abort+0x120/0x130
    [...]
    ```

    It wasn't dead in the water there, but continued kernel boot for a while
    still until it paniced.

    Trying the 5.16.0-6-mckinley kernel on this rx2660 shows similar
    problems like above, though a little later in the kernel boot process:

    ```
    [ 0.000000] Linux version 5.16.0-6-mckinley
    (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-19) 11.2.0, GNU
    ld (GNU Binutils for Debian) 2.38) #1 SMP Debian 5.16.18-1 (2022-03-29)
    [ 0.000000] efi: EFI v2.00 by HP
    [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fde6000
    ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fde4000
    [ 0.000000] PCDP: v3 at 0x3fde4000
    [...]
    [ 1.213851] zbud: loaded
    [ 1.217851] integrity: Platform Keyring initialized
    [ 1.217851] Key type asymmetric registered
    [ 1.217851] Asymmetric key parser 'x509' registered
    [ 1.217851] Block layer SCSI generic (bsg) driver version 0.4 loaded
    (major 250)
    [ 1.217859] io scheduler mq-deadline registered
    [ 1.222505] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
    [ 1.222505] ACPI: button: Power Button [PWRF]
    [ 1.222505] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
    [ 1.222505] ACPI: button: Sleep Button [SLPF]
    [...]
    [ 2.025839] rtc-efi rtc-efi.0: setting system clock to
    2022-04-26T11:59:48 UTC (1650974388)
    [ 2.026516] ledtrig-cpu: registered to indicate activity on CPUs
    [ 2.026516] NET: Registered PF_INET6 protocol family
    [ 2.030517] usercopy: Kernel memory overwrite attempt detected to
    linear kernel text (offset 450555, size 4)!
    [ 2.030517] kernel BUG at mm/usercopy.c:99!
    [ 2.030517] kworker/u16:0[81]: bugcheck! 0 [1]
    [ 2.030517] Modules linked in:
    [ 2.030517]
    [ 2.030517] CPU: 2 PID: 81 Comm: kworker/u16:0 Not tainted
    5.16.0-6-mckinley #1 Debian 5.16.18-1
    [ 2.031443] Hardware name: hp server rx2660 , BIOS
    04.04 07/15/2008
    [ 2.031443] psr : 00001010084a6010 ifs : 8000000000000410 ip : [<a000000101336b70>] Not tainted (5.16.0-6-mckinley Debian 5.16.18-1)
    [ 2.031443] ip is at usercopy_abort+0x120/0x130
    [...]
    ```

    With `hardened_usercopy=off` added to the kernel commandline I get 5.16.0-6-mckinley to boot the rx2660 to the login prompt, though I still
    see:

    ```
    [...]
    [ 1.915245] Freeing initrd memory: 27200kB freed
    [ 1.917530] usercopy: Kernel memory overwrite attempt detected to
    linear kernel text (offset 450555, size 4)!
    [ 1.917739] kernel BUG at mm/usercopy.c:99!
    [ 1.917739] kworker/u16:1[82]: bugcheck! 0 [1]
    [ 1.917739] Modules linked in:

    [ 1.917739] CPU: 7 PID: 82 Comm: kworker/u16:1 Not tainted
    5.16.0-6-mckinley #1 Debian 5.16.18-1
    [ 1.917739] Hardware name: hp server rx2660 , BIOS
    04.04
    07/15/2008
    [ 1.921739] psr : 00001010084a6010 ifs : 8000000000000410 ip : [<a000000101336b70>] Not tainted (5.16.0-6-mckinley Debi
    an 5.16.18-1)
    [ 1.921739] ip is at usercopy_abort+0x120/0x130
    [...]
    ```

    ...in the boot process. Runing some benchmarks (7z, openssl) didn't
    print any issues into the system console.

    It's similar with 5.17.0-1-mckinley, though with much more error
    messages during kernel boot, but it succeeds. Again during benchmark
    runs, no additional errors logged to the system console.

    So maybe `hardened_usercopy=off` works more like changing "errors" to "warnings" or so.

    ****

    BTW, checking the bootloader configuration of my rx2620 I recognized
    that it uses `hardened_usercopy=off` since April 2019, which would
    explain, why booting 5.16.0-6-mckinley and benchmarking it in early
    April 2022 worked well. :-/

    Until I found that out, I suspected a difference between zx1 (rx2620)
    and zx2 (rx2660) chipsets in regard to that memcopy issues, but the
    chipset could be unrelated then.

    @Anton:
    So maybe best to give `hardened_usercopy=off` a try on your rx2660, too.
    From my testing on rx2660 and rx2620 this seems to unbreak the kernel
    boot and maybe also makes it less likely to hit the problem post boot. I
    don't know why Adrian's rx2660 seems to be unaffected by this, though.

    I'll now look at my other Itanium gear, rx2800 i2 first,

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Pedro Miguel Justo on Tue Apr 26 18:10:01 2022
    Hi Pedro,

    On 26.04.22 17:01, Pedro Miguel Justo wrote:
    On 2022/Apr/26, at 06:34, Frank Scheiner <frank.scheiner@web.de> wrote:
    @Anton:
    So maybe best to give `hardened_usercopy=off` a try on your rx2660, too.
    From my testing on rx2660 and rx2620 this seems to unbreak the kernel
    boot and maybe also makes it less likely to hit the problem post boot. I
    don't know why Adrian's rx2660 seems to be unaffected by this, though.


    I did. That is why I ended up compiling 5.17 with the entire thing turned off. With 5.17, on my rx2660 Montvale with 8 cores the machine can’t get past early boot even with hardened_usercopy=off.

    Those ‘warnings' are actually processes being killed. And they depend on the direction the bad copy was happening.

    Thanks for clarification.


    If you look at my prior responses, with the 4.19 kernel I was also running along fine for hours and, after some time building the kernel (a benchmark in itself) it would start producing these warning and would not allow compilation to continue any
    further. I would reboot the machine and that gave me a few more hours. When I tried 'hardened_usercopy=off’ on the 4.19 kernel that worked. I no longer got these process terminations after a few hours and the machine was able to build the entire kernel
    from beginning to end.

    So, 4.19 and 5.17 are different in many ways (symptom-wise):
    - I never got a bugckeck (panic) level failure on the 4.19. They were all process termination level.
    - On the 4.19 these took quite some time to show up. Seemed to depend on the number of processes created in the past and was mitigated by a reboot. On the 5.17 it was very aggressive, showing up early in boot, even on system threads like the crypto bot
    self test. Disabling the crypto boot self test made it go father but not much. If the error is detected on a system thread, there is no process to terminate: it is game over.
    - hardened_usercopy=off was observed by 4.19 but ignored by 5.17

    Well, it seems to make a difference for my rx2660, maybe because of
    Montecitos instead of Montvales, I don't know. Or it depends on the
    available memory (i.e. maybe it happens more/less often with less/more
    memory available). Mine has 32 GiB in total.

    I don’t exclude the possibility of human error in conducting all these experiments (some of the process is error prone), but I did run these experiments more than just a few times, so it would have to be a heck of a coincidence to and up with
    consistent results.

    Sure, my test results are also more anecdotal as it takes so much time
    to boot and run things (`openssl speed -elapsed` takes around 23 mins).

    I'll now look at my other Itanium gear, rx2800 i2 first,

    First testing with 5.17.0-1-mckinley on my rx2800 i2 interestingly shows
    no issues with memcopy at all, not during kernel boot, nor post boot. My
    kernel cmdline is as follows:

    ```
    root@rx2800-i2:~# cat /proc/cmdline
    BOOT_IMAGE=net0:/AC10027B.vmlinuz root=/dev/nfs ip=:::::enp8s0f0:dhcp modprobe.blacklist=hpsa,radeon
    ```

    It could well be, that the Tukwilas behave differently in that case. In
    the end they have their memory controller included in the processor and
    not in the chipset like the older Montecitos or Montvales.

    For reference:

    firmware info:
    ```
    [rx2800-i2-mp-ilo] CM:hpiLO-> sysrev


    SYSREV

    Revisions Active Pending
    -------------------------------------
    iLO FW : 01.54.03
    System FW : 01.93
    MHW FPGA : 02.02
    Power Mon FW : 02.09
    PRS HW : 02.06
    IOH HW : 02.02
    Power Supply 1 : 02.01
    Power Supply 2 : 02.01
    ```

    hardware info:
    ```
    root@rx2800-i2:~# uname -a
    Linux rx2800-i2 5.17.0-1-mckinley #1 SMP Debian 5.17.3-1 (2022-04-18)
    ia64 GNU/Linux

    root@rx2800-i2:~# lscpu
    Architecture: ia64
    CPU op-mode(s): 64-bit
    Byte Order: Little Endian
    CPU(s): 8
    On-line CPU(s) list: 0-7
    Vendor ID: GenuineIntel
    BIOS Vendor ID: Intel(R) Itanium(R) Processor 9320
    Model name: Intel(R) Itanium(R) Processor 9320
    BIOS Model name: Intel(R) Itanium(R) Processor 9320
    CPU family: 32
    Model: 4
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 1
    BogoMIPS: 2920.44
    Flags: branchlong, 16-byte atomic ops, 0x8
    Caches (sum of all):
    L1d: 64 KiB (4 instances)
    L1i: 64 KiB (4 instances)
    L2d: 1 MiB (4 instances)
    L2i: 4 MiB (8 instances)
    L3: 32 MiB (8 instances)
    NUMA:
    NUMA node(s): 1
    NUMA node0 CPU(s): 0-7

    root@rx2800-i2:~# free -m
    total used free shared buff/cache
    available
    Mem: 24218 138 23983 2 96
    23871
    Swap: 0 0 0
    ```

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to John Paul Adrian Glaubitz on Wed Apr 27 10:10:01 2022
    Hello!

    On 4/26/22 09:23, John Paul Adrian Glaubitz wrote:
    We can certainly send a pull request to the Debian kernel packaging repository to disable
    CONFIG_HARDENED_USERCOPY although I'm not sure what ramifications that would have.

    But since the feature is broken on Itanium anyway, I guess it won't hurt.

    Just opened a PR to disable the feature until the issue has been fixed [1].

    Adrian

    [1] https://salsa.debian.org/kernel-team/linux/-/merge_requests/469

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sergei Trofimovich@21:1/5 to Pedro Miguel Justo on Wed Apr 27 09:50:01 2022
    On Tue, 26 Apr 2022 08:18:25 +0000
    Pedro Miguel Justo <pmsjt@texair.net> wrote:

    On 2022/Apr/26, at 01:01, Sergei Trofimovich <slyich@gmail.com> wrote:

    On Tue, 26 Apr 2022 02:43:00 +0000
    Pedro Miguel Justo <pmsjt@texair.net> wrote:

    On 2022/Apr/25, at 14:27, Pedro Miguel Justo <pmsjt@texair.net> wrote: >>>


    On 2022/Apr/25, at 14:09, Sergei Trofimovich <slyich@gmail.com> wrote: >>>>
    On Mon, 25 Apr 2022 15:07:58 +0000
    Pedro Miguel Justo <pmsjt@texair.net> wrote:

    On 2022/Apr/25, at 01:22, Pedro Miguel Justo <pmsjt@texair.net> wrote: >>>>>>


    On 2022/Apr/25, at 01:14, Frank Scheiner <frank.scheiner@web.de> wrote:

    Hi guys,

    On 25.04.22 10:09, John Paul Adrian Glaubitz wrote:
    From what I can understand by the information in the bugcheck, this is somewhat related to a violation
    in parameter copy from user to kernel during some boot-time, crypto, self-test. Does that sound right?
    If that is the case, how would this be related to FW? >>>>>>>>
    I'm not claiming that it must be related to the firmware, I'm just saying that I don't see this problem
    on my RX2660 at all and I have even reinstalled it recently with one of the latest firmware images
    without having to pass any parameter to the command line. >>>>>>>
    A difference between Adrian's rx2660 and Pedro's rx2660 is Montecito >>>>>>> left and Montvale right.

    But could still be multiple other reasons we haven't looked at yet in >>>>>>> detail:

    * amount of memory installed
    * SMT enabled or not
    * number of processor modules installed

    It might be possible for me to check on my rx2660s (one with Montvale >>>>>>> and one with Montecito(s)) tomorrow. I will then also look at my other
    Itanium gear and gather relevant information.


    Yes, this sounds mode likely to me too.

    The crypto self-tests seem to be an innocent bystander here. I tried booting the most recent kernel with the option “cryptomgr.notests” and it went much farther. Alas it still failed with another buffer copy validation for a different caller
    altogether:

    [ 3.836466] [<a000000101353690>] usercopy_abort+0x120/0x130
    [ 3.836466] sp=e0000001000cfdf0 bsp=e0000001000c9388
    [ 3.836466] [<a0000001004c5660>] __check_object_size+0x3c0/0x420 >>>>>> [ 3.836466] sp=e0000001000cfe00 bsp=e0000001000c9350
    [ 3.836466] [<a000000100570030>] sys_getcwd+0x250/0x420
    [ 3.836466] sp=e0000001000cfe00 bsp=e0000001000c92c8
    [ 3.836466] [<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20 >>>>>> [ 3.836466] sp=e0000001000cfe30 bsp=e0000001000c92c8
    [ 3.836466] [<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400 >>>>>> [ 3.836466] sp=e0000001000d0000 bsp=e0000001000c92c8

    This suggests the bug might be in the logic validating these buffers against the allocations (heap, span, etc).

    I don’t know why hardened_usercopy=off is not being observed by the kernel. As a work-around I am copying myself a new kernel with CONFIG_HARDENED_USERCOPY disabled at the source.

    So, I finished compiling my kernel with CONFIG_HARDENED_USERCOPY disabled. Guess what:

    pmsjt@debian:~$ uname -a
    Linux debian 5.17.3-rt17 #2 SMP Mon Apr 25 16:55:00 PDT 2022 ia64 GNU/Linux

    Yup, the system starts just fine with the most recent kernel. So, two things we can infer from this:
    - Yes, usercopy validation appears to be broken. The contours of how broken it is are yet unknown but we’ll have to investigate to see what part of the validation is failing.

    My vague memory tells me that the breakage mechanics is the following:

    1. [ok] mm/usercopy.c:check_kernel_text_object() gets called on various kernel addresses
    2. [ok] ia64 address space has at least two widely used kernel addresses:
    - 0xe0000000_00000000-0xe000..ff-ffffffff - proper linear mapping, 0xe0000000_00000000 + phys
    (so called, RGN_KERNEL = 7)
    - 0xa0000001_00000000-0xa0000000_00ffffff - kernel ~linear mapping, 0xa0000001_00000000 + phys
    (so called, RGN_GATE = 5)

    Both are easy to translate to proper linear address, but the translations are not identical. Note that RGN_GATE has an offset.

    ia64's stack for some reason lives in RGN_KERNEL space (probably due to historical reasons before kernel switched to vmapped stacks on other arches).
    You can see these addresses in backtraces as
    sp=e0000001000cfdf0 bsp=e0000001000c9388

    ia64's kernel code on the other hand lives in RGN_GATE.
    You can see these addresses in backtraces as
    [<a000000101353690>] usercopy_abort+0x120/0x130

    Now, check_kernel_text_object() compares passed kernel address for 2 ranges:
    - RGN_GATE:
    unsigned long textlow = (unsigned long)_stext;
    unsigned long texthigh = (unsigned long)_etext;
    if (overlaps(ptr, n, textlow, texthigh))
    usercopy_abort("kernel text", NULL, to_user, ptr - textlow, n);

    I think this check is correct. There is no address translation.

    - RGN_KERNEL:
    textlow_linear = (unsigned long)lm_alias(textlow);
    texthigh_linear = (unsigned long)lm_alias(texthigh);
    if (overlaps(ptr, n, textlow_linear, texthigh_linear))
    usercopy_abort("linear kernel text", NULL, to_user,...

    I think this one is wrong. Looking at the definition of lm_alias()
    it can ge used only on RGN_KERNEL addresses (which looks useless,
    they are already in correct form) and literal symbol names:

    include/linux/mm.h:#define lm_alias(x) __va(__pa_symbol(x))
    include/linux/mm.h:#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
    arch/ia64/include/asm/page.h:#define __pa(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = 0; _v.l;})
    #define __va(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = -1; _v.p;})

    I think that means __pa_symbol(x) against local variables like textlow
    is completely broken.

    If lm_alias() works at all on ia64 maybe that is enough to unbreak usercoy:

    --- a/mm/usercopy.c
    +++ b/mm/usercopy.c
    @@ -133,13 +133,13 @@ static inline void check_kernel_text_object(const unsigned long ptr,
    * __pa() is not just the reverse of __va(). This can be detected
    * and checked:
    */
    - textlow_linear = (unsigned long)lm_alias(textlow);
    + textlow_linear = (unsigned long)lm_alias(_stext);
    /* No different mapping: we're done. */
    if (textlow_linear == textlow)
    return;

    /* Check the secondary mapping... */
    - texthigh_linear = (unsigned long)lm_alias(texthigh);
    + texthigh_linear = (unsigned long)lm_alias(_etext);
    if (overlaps(ptr, n, textlow_linear, texthigh_linear))
    usercopy_abort("linear kernel text", NULL, to_user,
    ptr - textlow_linear, n);

    If not here is an ia64-specific workaround:

    --- a/mm/usercopy.c
    +++ b/mm/usercopy.c
    @@ -133,13 +133,13 @@ static inline void check_kernel_text_object(const unsigned long ptr,
    * __pa() is not just the reverse of __va(). This can be detected
    * and checked:
    */
    - textlow_linear = (unsigned long)lm_alias(textlow);
    + textlow_linear = (unsigned long)ia64_imva(textlow);
    /* No different mapping: we're done. */
    if (textlow_linear == textlow)
    return;

    /* Check the secondary mapping... */
    - texthigh_linear = (unsigned long)lm_alias(texthigh);
    + texthigh_linear = (unsigned long)ia64_imva(texthigh);
    if (overlaps(ptr, n, textlow_linear, texthigh_linear))
    usercopy_abort("linear kernel text", NULL, to_user,
    ptr - textlow_linear, n);

    CAVEAT: I did not compile- or run- test it. Might need a bit of pointer/long casting around.


    I don’t mind at all giving this a try. Sounds quite plausible.

    Do you think the stacks (and RSE) will eventually have to move from RGN_KERNEL to vmapped the same as other archs to avoid issues like this and other to come?

    I'd say the move to virtual mapping area would not change things much
    even for this specific bug. It would move more addresses into a place
    where __pa() macro does not work and thus requires proper resolution
    to get linear address.

    Moving to vmap on ia64 does not have a benefit of guarding against stack overflows that other targets have: backing store clashes into memory stack first on overflows somewhere in the middle of the stack area. On the other
    hand vmap does help in getting rid of higher order allocations for stacks.

    Implementing vmapped stacks is probably a bit tricky: ia64 is an architecture where TLB cache registers (itc*/dtc*) are maintained in software (with some hardware assistance from VHPT walker). To make things simpler ia64 sets static TLB registers (itr*/dtr*) to kernel as is: code around IA64_TR_KERNEL and IA64_TR_CURRENT_STACK has a few assumptions around the stack layout.

    I'm not sure if TLB fault handlers would just work for vmapped stack. Maybe?

    --

    Sergei

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anatoly Pugachev@21:1/5 to frank.scheiner@web.de on Wed Apr 27 14:50:01 2022
    On Tue, Apr 26, 2022 at 4:35 PM Frank Scheiner <frank.scheiner@web.de> wrote:
    Booting `Debian GNU/Linux Sid (diskless)'

    Loading Linux kernel ...
    Loading initial ramdisk ...
    [ 0.000000] Linux version 5.17.0-1-mckinley (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-20) 11.2.0, GNU
    ld (GNU Binutils for Debian) 2.38) #1 SMP Debian 5.17.3-1 (2022-04-18)
    [ 0.000000] efi: EFI v2.00 by HP
    [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fde6000
    ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fde4000
    [ 0.000000] PCDP: v3 at 0x3fde4000
    [...]
    [ 1.199313] zbud: loaded
    [ 1.199313] integrity: Platform Keyring initialized
    [ 1.199313] Key type asymmetric registered
    [ 1.199313] Asymmetric key parser 'x509' registered
    [ 1.927433] Freeing initrd memory: 26688kB freed
    [ 1.930079] usercopy: Kernel memory overwrite attempt detected to
    linear kernel text (offset 450555, size 4)!
    [ 1.930079] kernel BUG at mm/usercopy.c:100!
    [ 1.930079] kworker/u16:1[71]: bugcheck! 0 [1]
    [ 1.930079] Modules linked in:
    [ 1.930079]
    [ 1.930079] CPU: 3 PID: 71 Comm: kworker/u16:1 Not tainted 5.17.0-1-mckinley #1 Debian 5.17.3-1
    [ 1.930079] Hardware name: hp server rx2660 , BIOS
    04.04 07/15/2008
    [ 1.930079] psr : 00001010084a6010 ifs : 8000000000000410 ip : [<a000000101353690>] Not tainted (5.17.0-1-mckinley Debian 5.17.3-1)
    [ 1.930079] ip is at usercopy_abort+0x120/0x130
    [...]

    is it possible to fill a bug report at linux kernel bugzilla and post
    full stack trace with the description
    to: linux-ia64@vger.k.o
    cc: linux-kernel@vger.k.o , linux-mm@kvack.org
    and CC to the author of the patch as well? https://lwn.net/Articles/694470/

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anatoly Pugachev on Wed May 4 10:20:01 2022
    On 27.04.22 14:45, Anatoly Pugachev wrote:
    is it possible to fill a bug report at linux kernel bugzilla and post
    full stack trace with the description
    to: linux-ia64@vger.k.o
    cc: linux-kernel@vger.k.o , linux-mm@kvack.org
    and CC to the author of the patch as well? https://lwn.net/Articles/694470/

    Sorry, but I was out of time last week and this week's already pretty
    full. I'll look into this as soon as time allows, although I believe I
    am not the original poster of these memcopy problems.

    But I can add some additional info now: Last week I also looked at my
    rx4640 with 4 x Madison processors and it looks like this machine (or
    its processors) are affected as much as the Montvale machines/processors
    - so no successful operation w/ and w/o `hardened_usercopy=off`.

    So in short, you're (relatively) fine with Montecitos (both in rx2620
    and rx2660) and `hardened_usercopy=off` and not affected visibly with
    Tukwilas (in rx2800 i2). But Madisons (in rx4640) and Montvales (in
    rx2660 and rx6600 and most likely also rx3600) are not good due to this
    memcopy problem.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Frank Scheiner on Tue May 10 16:10:01 2022
    On 04.05.22 10:11, Frank Scheiner wrote:
    On 27.04.22 14:45, Anatoly Pugachev wrote:
    is it possible to fill a bug report at linux kernel bugzilla and post
    full stack trace with the description
    to: linux-ia64@vger.k.o
    cc: linux-kernel@vger.k.o , linux-mm@kvack.org
    and CC to the author of the patch as well?
    https://lwn.net/Articles/694470/

    Ok, the report went CC to debian-ia64@lists.debian.org and is available
    here:

    https://lists.debian.org/debian-ia64/2022/05/msg00001.html

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Borisov@21:1/5 to All on Sun Jun 5 20:20:01 2022
    I've booted off with Gentoo (https://bouncer.gentoo.org/fetch/root/all/releases/ia64/autobuilds/20220601T030345Z/install-ia64-minimal-20220601T030345Z.iso).
    Default parameters, i.e. without hardmemcpy. No kernel faults at all.

    Kernel: 5.18.1-gentoo-r1-ia64

    Log is here: https://pastebin.com/gzLFyUKx

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anton Borisov on Mon Jun 6 12:50:01 2022
    Hi Anton,

    On 05.06.22 20:00, Anton Borisov wrote:
    I've booted off with Gentoo (https://bouncer.gentoo.org/fetch/root/all/releases/ia64/autobuilds/20220601T030345Z/install-ia64-minimal-20220601T030345Z.iso).
    Default parameters, i.e. without hardmemcpy.

    Just for clarification:

    Is hardened memcopy "just" deconfigured in the kernel config for the
    used kernel...

    No kernel faults at all.

    Kernel: 5.18.1-gentoo-r1-ia64

    ...or was it maybe fixed in some way for ia64? As we haven't yet tested
    a 5.18.x on Debian IIRC.

    Log is here: https://pastebin.com/gzLFyUKx

    In any case, using a "Gentoo kernel" with Debian userland could be an
    option, too, then. I did that for a time to be able to operate my rx2800
    i2 until the underlying problem was fixed upstream.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Pedro Miguel Justo on Mon Jun 6 18:30:01 2022
    On 06.06.22 17:45, Pedro Miguel Justo wrote:
    On 2022/Jun/06, at 03:44, Frank Scheiner <frank.scheiner@web.de> wrote:
    On 05.06.22 20:00, Anton Borisov wrote:
    I've booted off with Gentoo
    (https://bouncer.gentoo.org/fetch/root/all/releases/ia64/autobuilds/20220601T030345Z/install-ia64-minimal-20220601T030345Z.iso).
    Default parameters, i.e. without hardmemcpy.

    Just for clarification:

    Is hardened memcopy "just" deconfigured in the kernel config for the
    used kernel...

    Any chance this kernel was produced after Adrian’s PR [1] was merged upstream?

    Adrian's PR was not for the upstream kernel, but for the Debian kernel
    config for ia64.

    [1] https://salsa.debian.org/kernel-team/linux/-/merge_requests/469

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)