[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=543cea9accd9804307541cb93d3ed7ec94b07237
[2] https://marc.info/?l=linux-ia64&m=156144480821712&w=2
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1557655
Unfortunately, this blade has the same architecture as the RX2800 and won't properly boot due to this bug in the DMA code [1] for which there is no fix yet, see also [2].
I spend a lot of time getting Anatoly's new blade installed yesterday but eventually gave up. While I was able to install Debian Squeeze (Wheezy wouldn't work either, kernel just reboots akin to the ia64 gcc bug), the installed system won't boot as the hpsa driver wouldn't load.
Hi!
On 8/2/20 8:05 AM, John Paul Adrian Glaubitz wrote:
Unfortunately, this blade has the same architecture as the RX2800 and won't >> properly boot due to this bug in the DMA code [1] for which there is no fix >> yet, see also [2].
I spend a lot of time getting Anatoly's new blade installed yesterday but
eventually gave up. While I was able to install Debian Squeeze (Wheezy
wouldn't work either, kernel just reboots akin to the ia64 gcc bug), the
installed system won't boot as the hpsa driver wouldn't load.
Finally got it working with the 4.14.83 kernel plus the Gentoo ptrace patch:
Linux lenz 4.14.83-00001-g6ef2496425e7 #1 SMP Sun Aug 2 19:31:30 CEST 2020 ia64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@lenz:~# uname -a
Linux lenz 4.14.83-00001-g6ef2496425e7 #1 SMP Sun Aug 2 19:31:30 CEST 2020 ia64 GNU/Linux
root@lenz:~#
The machine is running Squeeze now. I will install unstable using debootstrap on the second disk later today and then the buildd is finally up and running again.
The machine is running Squeeze now. I will install unstable using debootstrap
on the second disk later today and then the buildd is finally up and running >> again.
Great work and great news! So do you also have enabled CONFIG_ZONE_DMA32
for the used kernel in addition or just the Gentoo ptrace patch?
[1] https://dev.gentoo.org/~slyfox/config-4.19.86-gentoo
[2] git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
On 8/4/20 12:40 PM, Frank Scheiner wrote:
The machine is running Squeeze now. I will install unstable using debootstrap
on the second disk later today and then the buildd is finally up and running
again.
Great work and great news! So do you also have enabled CONFIG_ZONE_DMA32
for the used kernel in addition or just the Gentoo ptrace patch?
There is actually no CONFIG_ZONE_DMA32 for ia64, just CONFIG_ZONE_DMA and that is set.
The kernel has this configuration from Sergei [1] and the ptrace patch from the Gentoo kernel, otherwise it's a vanilla upstream kernel 4.14.83.
I cannot say yet which particular change fixes the problem, but I'm confident I will be able to figure that out. If you want to test yourself, you may
try 4.14.83 from the stable branch [2] with Sergei's configuration but without
the ptrace patch.
I will get the machine up and running first, so that we can resume building packages. Anatoly said the blade is a dual blade, so I might be able to perform the kernel tests on the second blade.
My wild guess is that's the ptrace patch that fixes the problem.
Adrian
[1] https://dev.gentoo.org/~slyfox/config-4.19.86-gentoo
[2] git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
On 04.08.20 13:20, John Paul Adrian Glaubitz wrote:
The kernel has this configuration from Sergei [1] and the ptrace patch
from
the Gentoo kernel, otherwise it's a vanilla upstream kernel 4.14.83.
I cannot say yet which particular change fixes the problem, but I'm
confident
I will be able to figure that out. If you want to test yourself, you may
try 4.14.83 from the stable branch [2] with Sergei's configuration but
without
the ptrace patch.
Yeah, I'll give that a try.
My wild guess is that's the ptrace patch that fixes the problem.
Yeah, I think so, too, but will check that now, to be sure.
All kernels used Sergei's configuration adapted to enable network boot.
* 4.14.83 vanilla w/o ptrace patch and compiled with "gcc (Gentoo
7.3.0-r3 p1.4) 7.3.0" leads to a reboot after kernel and initramfs are loaded:
```
ELILO v3.16 for EFI/IA-64
..
Uncompressing Linux... done
Loading file AC10027B.initrd.img...done
1,0,0,0 5400006301E10000 0000000000000000 EVN_BOOT_START
* 4.14.83 vanilla w/o ptrace patch and compiled with "gcc (Debian
10.2.0-3) 10.2.0" works and boots Debian GNU/Linux Sid successfully to
the login prompt on my rx2800 i2 and also runs stable enough to
recompile itself:
```
ELILO v3.16 for EFI/IA-64
..
Uncompressing Linux... done
Loading file AC10027B.initrd.img...done
[ 0.000000] Linux version 4.14.83vanilla (root@rx2800-i2) (gcc
version 10.2.0 (Debian 10.2.0-3)) #1 SMP Tue Aug 4 15:56:40 UTC 2020
[...]
Debian GNU/Linux bullseye/sid rx2800-i2 ttyS1
rx2800-i2 login:
```
* The same holds true for 4.14.191 vanilla w/o ptrace patch and compiled
with the same gcc.
So latest 4.14.x kernels should work w/o the patch on a rx2800 i2 with Itanium 9300 (Tukwila) series processors - if they are compiled with gcc 10.
So latest 4.14.x kernels should work w/o the patch on a rx2800 i2 with
Itanium 9300 (Tukwila) series processors - if they are compiled with gcc 10.
I'm currently running a 4.14.192 kernel with the ptrace patch. While it doesn't
crash, I'm seeing a kworker thread with rather high load.
Do you observe that as well?
Hello Frank!
On 8/4/20 9:04 PM, Frank Scheiner wrote:
All kernels used Sergei's configuration adapted to enable network boot.
* 4.14.83 vanilla w/o ptrace patch and compiled with "gcc (Gentoo
7.3.0-r3 p1.4) 7.3.0" leads to a reboot after kernel and initramfs are
loaded:
```
ELILO v3.16 for EFI/IA-64
..
Uncompressing Linux... done
Loading file AC10027B.initrd.img...done
1,0,0,0 5400006301E10000 0000000000000000 EVN_BOOT_START
I have noticed that such an immediate reboot is also observed with certain versions of elilo. For example, the elilo version in Wheezy causes an immediate reboot while the version from Squeeze works.
But we're using GRUB anyway. And since unpatched versions of gcc are
known to produce a buggy kernel, we don't need to test that either.
* 4.14.83 vanilla w/o ptrace patch and compiled with "gcc (Debian
10.2.0-3) 10.2.0" works and boots Debian GNU/Linux Sid successfully to
the login prompt on my rx2800 i2 and also runs stable enough to
recompile itself:
```
ELILO v3.16 for EFI/IA-64
..
Uncompressing Linux... done
Loading file AC10027B.initrd.img...done
[ 0.000000] Linux version 4.14.83vanilla (root@rx2800-i2) (gcc
version 10.2.0 (Debian 10.2.0-3)) #1 SMP Tue Aug 4 15:56:40 UTC 2020
[...]
Debian GNU/Linux bullseye/sid rx2800-i2 ttyS1
rx2800-i2 login:
```
* The same holds true for 4.14.191 vanilla w/o ptrace patch and compiled
with the same gcc.
So latest 4.14.x kernels should work w/o the patch on a rx2800 i2 with
Itanium 9300 (Tukwila) series processors - if they are compiled with gcc 10.
I'm currently running a 4.14.192 kernel with the ptrace patch. While it doesn't
crash, I'm seeing a kworker thread with rather high load.
Do you observe that as well?
Hello!
On 8/5/20 2:51 PM, John Paul Adrian Glaubitz wrote:
So latest 4.14.x kernels should work w/o the patch on a rx2800 i2 with
Itanium 9300 (Tukwila) series processors - if they are compiled with gcc 10.
I'm currently running a 4.14.192 kernel with the ptrace patch. While it doesn't
crash, I'm seeing a kworker thread with rather high load.
Do you observe that as well?
I have tried 4.19.137 now with the same configuration and Sergey's ptrace patch
and sure enough it crashes when loading the first modules.
On 8/5/20 3:56 PM, Frank Scheiner wrote:
And the behavior is similar to what Sergei wrote in [1] I think, so I'm
confident that this is not coming from the boot loader, but from using
gcc 7.3.0 w/o ptrace patch.
[1]: https://lore.kernel.org/patchwork/comment/1081244/
I don't see any comments regarding the RX2800 in this discussion.
And, FWIW, the broken version of elilo crashes the exact same way as
the kernel without the ptrace fix which is why it took me a while
to figure that out.
It's just meant as a heads-up since GRUB is known to work very well
and I don't expect any such surprises when using GRUB instead of
elilo.
I think the unaligned access might be what keeps the kworker thread busy.Do you observe that as well?
Not that I know of, but I didn't check activity the whole time. I'll
observe activity next time I fire the machine up.
What I saw on Debian were unaligned memory accesses. But I think that is
common on non-x86 arches.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 84:16:54 |
Calls: | 6,658 |
Calls today: | 4 |
Files: | 12,203 |
Messages: | 5,333,601 |
Posted today: | 1 |