• GRUB testers on SPARC needed

    From Robin Cremer@21:1/5 to All on Sat May 15 19:40:01 2021
    This is a multi-part message in MIME format.
    Hi,
    We're in the process of migrating Debian for sparc64 from SILO to GRUB
    as GRUB upstream is adding support for modern SPARC machines thanks to
    the work of Eric Snowberg from Oracle.

    In order to make sure GRUB works on all machines supported by the sparc64 port, we need your help to test GRUB on your particular hardware, the older your machine, the better.
    [...]

    7. Report back to the list and include your hardware and partition setup

    A bit late to the party, as SILO already appears to be gone (including
    the repos) and all install images use GRUB now, but I'm having trouble
    and wanted to report this - and maybe get some ideas, in case this is
    the best address to do so:

    I'm in the process of migrating most of our SPARC servers running
    Solaris 10 & the old Debian with 32bit SPARC userland to the SPARC64
    debport.
    Some servers running Solaris 11 will follow.

    Installing on two SunFire v215 went reasonably well

    /- (apart from recurring Kernel Panics with "Unable to handle kernel
    paging request in mna handler", most often triggered on boot immediately
    after the systemd binfmt service tries to start. This seems to have been mentioned in /2020/04/msg00020.html but never pinpointed and fixed?) -/

    but I can't seem to be able to configure GRUB on these servers as I did
    in the past with SILO (a 2-disk mdraid with mirrored /boot, / and swap).
    I'm currently stuck with /boot on only one disk and the rest of the
    system mirrored as I can't figure out how to install grub for a mirrored
    /boot partition:

    1) Installing to the mirror device always yields a Segmentation Fault. I
    was unable to get any clue with my limited gdb experience as to why -
    (with loaded debug symbols etc.: "Backtrace stopped: previous frame
    identical to this frame (corrupt stack?)"):
    # grub-install --skip-fs-probe --force --debug /dev/md0
    [...]
    grub-install: info: setting the root device to `mduuid/1ae243c1e2445aef777f4d32b671f41c'.
    grub-install: warning: File system `ext2' doesn't support embedding. grub-install: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
    grub-install: info: will leave the core image on the filesystem.
    Segmentation fault

    2) Trying to install to the individual disk partitions or the raw disk
    itself:
    grub-install: warning: File system `ext2' doesn't support embedding. grub-install: error: embedding is not possible, but this is required
    for RAID and LVM install.
    [...]
    grub-install: warning: Partition style `sun' doesn't support embedding. grub-install: error: embedding is not possible, but this is required
    for RAID and LVM install.

    Neither different filesystems (ext2, xfs, ...) nor different mdraid
    metadata formats made any difference.
    I can't test other disk labels, as the old OBP doesn't handle GPT AFAIR.
    Also, GRUB built from the most recent official sources from their git
    segfaults as well.

    Any pointers how to achieve this setup? What can I test or does someone
    else have a similar setup working? Am I doing something horribly wrong?
    I don't think mdraid-mirrored bootdisks should be too uncommon on this hardware.

    Thanks and cheers to the community keeping SPARC alive :-)
    Robin

    <html>
    <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    </head>
    <body text="#000000" bgcolor="#FFFFFF">
    Hi,<br>
    <blockquote type="cite">
    <pre>We're in the process of migrating Debian for sparc64 from SILO to GRUB
    as GRUB upstream is adding support for modern SPARC machines thanks to
    the work of Eric Snowberg from Oracle.

    In order to make sure GRUB works on all machines supported by the sparc64
    port, we need your help to test GRUB on your particular hardware, the older your machine, the better.
    [...]

    7. Report back to the list and include your hardware and partition setup
    </pre>
    </blockquote>
    <br>
    A bit late to the party, as SILO already appears to be gone
    (including the repos) and all install images use GRUB now, but I'm
    having trouble and wanted to report this - and maybe get some ideas,
    in case this is the best address to do so:<br>
    <br>
    I'm in the process of migrating most of our SPARC servers running
    Solaris 10 &amp; the old Debian with 32bit SPARC userland to the
    SPARC64 debport.<br>
    Some servers running Solaris 11 will follow.<br>
    <br>
    Installing on two SunFire v215 went reasonably well<br>
    <br>
    <i>- (apart from recurring Kernel Panics with "Unable to handle
    kernel paging request in mna handler", most often triggered on
    boot immediately after the systemd binfmt service tries to start.
    This seems to have been mentioned in /2020/04/msg00020.html but
    never pinpointed and fixed?) -</i><br>
    <br>
    but I can't seem to be able to configure GRUB on these servers as I
    did in the past with SILO (a 2-disk mdraid with mirrored /boot, /
    and swap).<br>
    I'm currently stuck with /boot on only one disk and the rest of the
    system mirrored as I can't figure out how to install grub for a
    mirrored /boot partition:<br>
    <br>
    1) Installing to the mirror device always yields a Segmentation
    Fault. I was unable to get any clue with my limited gdb experience
    as to why - (with loaded debug symbols etc.: "Backtrace stopped:
    previous frame identical to this frame (corrupt stack?)"):<br>
    <blockquote type="cite"># grub-install --skip-fs-probe --force
    --debug /dev/md0<br>
    [...]<br>
    grub-install: info: setting the root device to
    `mduuid/1ae243c1e2445aef777f4d32b671f41c'.<br>
    grub-install: warning: File system `ext2' doesn't support
    embedding.<br>
    grub-install: warning: Embedding is not possible.  GRUB can only
    be installed in this setup by using blocklists.  However,
    blocklists are UNRELIABLE and their use is discouraged..<br>
    grub-install: info: will leave the core image on the filesystem.<br>
    Segmentation fault</blockquote>
    <br>
    2) Trying to install to the individual disk partitions or the raw
    disk itself:<br>
    <blockquote type="cite">grub-install: warning: File system `ext2'
    doesn't support embedding.<br>
    grub-install: error: embedding is not possible, but this is
    required for RAID and LVM install.</blockquote>
    [...]<br>
    <blockquote type="cite">grub-install: warning: Partition style `sun'
    doesn't support embedding.<br>
    grub-install: error: embedding is not possible, but this is
    required for RAID and LVM install.<br>
    </blockquote>
    <br>
    Neither different filesystems (ext2, xfs, ...) nor different mdraid
    metadata formats made any difference.<br>
    I can't test other disk labels, as the old OBP doesn't handle GPT
    AFAIR.<br>
    Also, GRUB built from the most recent official sources from their
    git segfaults as well.<br>
    <br>
    Any pointers how to achieve this setup? What can I test or does
    someone else have a similar setup working? Am I doing something
    horribly wrong?<br>
    I don't think mdraid-mirrored bootdisks should be too uncommon on
    this hardware.<br>
    <br>
    Thanks and cheers to the community keeping SPARC alive :-)<br>
    Robin<br>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin Cremer@21:1/5 to All on Sun May 16 02:10:02 2021
    Responding to myself:

    Some progress:
    I put additional informational output between all commands in the
    suspect area in GRUBs util/setup.c and pinpointed the bug:

    #ifdef GRUB_SETUP_SPARC64
      {
        grub_partition_t container = root_dev->disk->partition;

        if (grub_strstr (container->partmap->name, "gpt"))
          bl.gpt_offset = grub_partition_get_start (container);
      }
    #endif

    When installing on an md-device - or other special devices - it will
    never have a partition table, thus "container" is null.
    After that, access to struct members is tried without checking if it
    even exists, leading to the segfault.
    if (container && grub_strstr (container->partmap->name, "gpt"))
    actually works & installs on LVM if I put a hint for GRUB into the
    device.map pointing to the UUID of the MDRAID.

    I'll try to get a patch for that submitted or discussed (I'm new to this
    and not exactly sure if the change has other implications).


    It still won't boot, though. The first "stage" in the 2nd partition
    block is executed by OBP and something along the lines of "GRUB FAIL -
    trap: Illegal Instruction" and on a second attempt "Unaligned Memory
    Access" was encountered...


    I'll post back,

    Greetings,
    Robin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Robin Cremer on Mon May 17 10:50:01 2021
    On 5/16/21 2:03 AM, Robin Cremer wrote:
    Responding to myself:

    Some progress:
    I put additional informational output between all commands in the suspect area in GRUBs util/setup.c and pinpointed the bug:

    #ifdef GRUB_SETUP_SPARC64
      {
        grub_partition_t container = root_dev->disk->partition;

        if (grub_strstr (container->partmap->name, "gpt"))
          bl.gpt_offset = grub_partition_get_start (container);
      }
    #endif

    When installing on an md-device - or other special devices - it will never have a partition table, thus "container" is null.

    It might be trying to read the partition table from a fixed position where it wouldn't be when using
    a software RAID, not sure. In any case, this definitely needs to be moved upstream and you should
    put Eric Snowberg from Oracle in the loop as he is the expert on GRUB for SPARC.

    After that, access to struct members is tried without checking if it even exists, leading to the segfault.
    if (container && grub_strstr (container->partmap->name, "gpt"))
    actually works & installs on LVM if I put a hint for GRUB into the device.map pointing to the UUID of the MDRAID.

    I'll try to get a patch for that submitted or discussed (I'm new to this and not exactly sure if the change has other implications).


    It still won't boot, though. The first "stage" in the 2nd partition block is executed by OBP and something along the lines
    of "GRUB FAIL - trap: Illegal Instruction" and on a second attempt "Unaligned Memory Access" was encountered...

    Most likely because the block numbers reported back by the software RAID don't map to the block numbers on the
    physical device which is why the first stage is just loading random garbage and executing it which leads to
    SIGILL.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Robin Cremer on Mon May 17 10:30:01 2021
    Hi Robin!

    On 5/15/21 7:25 PM, Robin Cremer wrote:
    7. Report back to the list and include your hardware and partition setup

    A bit late to the party, as SILO already appears to be gone (including the repos) and
    all install images use GRUB now, but I'm having trouble and wanted to report this - and
    maybe get some ideas, in case this is the best address to do so:

    You can still install SILO from snapshot.debian.org. However, I would recommend building
    the latest version from source as there have been some bugfixes in the meantime.

    I'm in the process of migrating most of our SPARC servers running Solaris 10 & the old Debian
    with 32bit SPARC userland to the SPARC64 debport. Some servers running Solaris 11 will follow.

    Good to hear.

    Installing on two SunFire v215 went reasonably well

    /- (apart from recurring Kernel Panics with "Unable to handle kernel paging request in mna handler",
    most often triggered on boot immediately after the systemd binfmt service tries to start. This seems
    to have been mentioned in /2020/04/msg00020.html but never pinpointed and fixed?) -/

    What kernel version are you running. There have actually been some fixes in this regard, in particular
    this fix:

    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/sparc?id=e5e8b80d352ec999d2bba3ea584f541c83f4ca3f

    but I can't seem to be able to configure GRUB on these servers as I did in the past with SILO (a 2-disk
    mdraid with mirrored /boot, / and swap). I'm currently stuck with /boot on only one disk and the rest of
    the system mirrored as I can't figure out how to install grub for a mirrored /boot partition:

    Please keep in mind that GRUB is installed using blocklists on these older machines which means it's not
    aware of the filesystem being used. The bootloader will just remember the location of the data blocks
    and the physical disk. So it has no means to deal with something sophisticated as a software RAID.

    Not sure how it worked with SILO which didn't use anything else than blocklists either (which is why
    the /boot partition couldn't be too large and the filesystem used couldn't be too fancy).

    1) Installing to the mirror device always yields a Segmentation Fault. I was unable to get any clue with
    my limited gdb experience as to why - (with loaded debug symbols etc.: "Backtrace stopped: previous frame
    identical to this frame (corrupt stack?)"):
    # grub-install --skip-fs-probe --force --debug /dev/md0
    [...]
    grub-install: info: setting the root device to `mduuid/1ae243c1e2445aef777f4d32b671f41c'.
    grub-install: warning: File system `ext2' doesn't support embedding.
    grub-install: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
    grub-install: info: will leave the core image on the filesystem.
    Segmentation fault

    As I said above, I don't expect this to work, really. That doesn't mean that grub-install should crash
    here. I will try to reproduce the issue when I find some time. Ideally, grub-install should just abort
    the installation in this case.

    But we could also find out how SILO worked in this case.

    2) Trying to install to the individual disk partitions or the raw disk itself:
    grub-install: warning: File system `ext2' doesn't support embedding.
    grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
    [...]
    grub-install: warning: Partition style `sun' doesn't support embedding.
    grub-install: error: embedding is not possible, but this is required for RAID and LVM install.

    Neither different filesystems (ext2, xfs, ...) nor different mdraid metadata formats made any difference.
    I can't test other disk labels, as the old OBP doesn't handle GPT AFAIR. Also, GRUB built from the most recent official sources from their git segfaults as well.

    Thanks for testing the git version, I was about to ask that.

    Any pointers how to achieve this setup? What can I test or does someone else have a similar setup
    working? Am I doing something horribly wrong? I don't think mdraid-mirrored bootdisks should be too
    uncommon on this hardware.

    From my statements above, I wouldn't expect GRUB with blocklists to work on a software RAID, so I
    think you probably have no choice but to use a single disk for booting. In any case, I think the
    the GRUB-specific discussion should be moved to the GRUB mailing list as this really concerns the
    low-level functionality of GRUB.

    Thanks and cheers to the community keeping SPARC alive :-)

    Sure. Glad it's being useful.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin Cremer@21:1/5 to All on Mon May 17 11:40:01 2021
    Hi Adrian,

    thanks for the response!

    First things first: I got GRUB to work yesterday :-)
    The last bit of magic missing in the picture was what you expected: The blocklist was "shifted" because GRUB-install didn't correctly determine
    the relation of physical disk vs. MDraid-Volume block positions.
    Not sure if it does when installing on x86 with blocklists?
    I found it hard to find usable documentation about the actual physical on-disk-layout of MDraid RAID-1.
    SuperBlocks are well documented, the "complex" RAID-levels reasonably
    well also, but I found no mention that building the volume with metadata-version 1.2
    (metadata 4k from beginning of disk, which shouldn't be a problem, as
    only the first 2 512-Byte sectors (->~1k) are in use for label & GRUB
    boot.img) actually shifts the start of Data in the mdraid to the end of
    the first 1MB block - so blocklist numbers are off by 1MB.
    I noticed that by trial & error: I opened a "normally installed" disk
    and the RAID member disk I tried to build in a hexeditor ;-)
    metadata=0.9 does not do this. I thought the metadata-settings only
    affected the position and "type" of the mdraid metadata blocks, not the
    actual on-disk-layout of the mirror.

    So, long story short:
    My old setup with SILO was exactly like this: metadata=0.9 for the /boot partition. With this, the ext3 blocks are on the same position on the
    physical disk AND on the md-volume.
    I might have set it up like this for the same reason back then, but
    forgot... It's been quite a while.

    The second (although minor) trouble was, that grub-mkconfig generates an unusable grub.cfg for this setup. It refuses to set the "root=" variable
    to (mduuid/UUID), which was in turn necessary to install the bootblocks, instead using settings that lead to GRUB being unable to open the
    partition label and failing back to OBP.

    The best solution I found was to edit the
    /usr/lib/grub/grub-mkconfig_lib shellscript to not set root at all.
    In my case, that works flawlessly, as GRUB actually starts with "root="
    already set to the disk that loaded it, so it even works with one disk
    pulled from the server, simulating failure - which was exactly what I
    wanted.


    So, long story short:
    - Patching util/setup.c to correctly handle (virtual) block devices
    without partition tables.
    - Use metadata=0.9 to build the mirror (!)
    - Add device.map entry for MDraid-Device to work around the "diskfilter
    writes not supported"-issue (from grub-probe -t ieee1275_hints -d /dev/md0):
    (mduuid/66bf8873932144cf2d6a74e4a05e67d3) /dev/md0
    - Strip the lines in /usr/lib/grub/grub-mkconfig_lib between
      # otherwise set root as per value in device.map.
    and
      IFS="$old_ifs"
    to make boot entries that do not try to re-set "root"


    After this fight, I achieved my boot mirror setup on GRUB/SPARC :-)

    I'll respond to the kernel-stuff separately.

    Thanks!

    - Robin


    Am 17.05.2021 um 10:23 schrieb John Paul Adrian Glaubitz:
    Hi Robin!

    On 5/15/21 7:25 PM, Robin Cremer wrote:
    7. Report back to the list and include your hardware and partition setup
    A bit late to the party, as SILO already appears to be gone (including the repos) and
    all install images use GRUB now, but I'm having trouble and wanted to report this - and
    maybe get some ideas, in case this is the best address to do so:
    You can still install SILO from snapshot.debian.org. However, I would recommend building
    the latest version from source as there have been some bugfixes in the meantime.

    I'm in the process of migrating most of our SPARC servers running Solaris 10 & the old Debian
    with 32bit SPARC userland to the SPARC64 debport. Some servers running Solaris 11 will follow.
    Good to hear.

    Installing on two SunFire v215 went reasonably well

    /- (apart from recurring Kernel Panics with "Unable to handle kernel paging request in mna handler",
    most often triggered on boot immediately after the systemd binfmt service tries to start. This seems
    to have been mentioned in /2020/04/msg00020.html but never pinpointed and fixed?) -/
    What kernel version are you running. There have actually been some fixes in this regard, in particular
    this fix:

    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/sparc?id=e5e8b80d352ec999d2bba3ea584f541c83f4ca3f
    but I can't seem to be able to configure GRUB on these servers as I did in the past with SILO (a 2-disk
    mdraid with mirrored /boot, / and swap). I'm currently stuck with /boot on only one disk and the rest of
    the system mirrored as I can't figure out how to install grub for a mirrored /boot partition:
    Please keep in mind that GRUB is installed using blocklists on these older machines which means it's not
    aware of the filesystem being used. The bootloader will just remember the location of the data blocks
    and the physical disk. So it has no means to deal with something sophisticated as a software RAID.

    Not sure how it worked with SILO which didn't use anything else than blocklists either (which is why
    the /boot partition couldn't be too large and the filesystem used couldn't be too fancy).

    1) Installing to the mirror device always yields a Segmentation Fault. I was unable to get any clue with
    my limited gdb experience as to why - (with loaded debug symbols etc.: "Backtrace stopped: previous frame
    identical to this frame (corrupt stack?)"):
    # grub-install --skip-fs-probe --force --debug /dev/md0
    [...]
    grub-install: info: setting the root device to `mduuid/1ae243c1e2445aef777f4d32b671f41c'.
    grub-install: warning: File system `ext2' doesn't support embedding.
    grub-install: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
    grub-install: info: will leave the core image on the filesystem.
    Segmentation fault
    As I said above, I don't expect this to work, really. That doesn't mean that grub-install should crash
    here. I will try to reproduce the issue when I find some time. Ideally, grub-install should just abort
    the installation in this case.

    But we could also find out how SILO worked in this case.

    2) Trying to install to the individual disk partitions or the raw disk itself:
    grub-install: warning: File system `ext2' doesn't support embedding.
    grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
    [...]
    grub-install: warning: Partition style `sun' doesn't support embedding.
    grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
    Neither different filesystems (ext2, xfs, ...) nor different mdraid metadata formats made any difference.
    I can't test other disk labels, as the old OBP doesn't handle GPT AFAIR.
    Also, GRUB built from the most recent official sources from their git segfaults as well.
    Thanks for testing the git version, I was about to ask that.

    Any pointers how to achieve this setup? What can I test or does someone else have a similar setup
    working? Am I doing something horribly wrong? I don't think mdraid-mirrored bootdisks should be too
    uncommon on this hardware.
    From my statements above, I wouldn't expect GRUB with blocklists to work on a software RAID, so I
    think you probably have no choice but to use a single disk for booting. In any case, I think the
    the GRUB-specific discussion should be moved to the GRUB mailing list as this really concerns the
    low-level functionality of GRUB.

    Thanks and cheers to the community keeping SPARC alive :-)
    Sure. Glad it's being useful.

    Adrian


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)