• Reviving schroot as used by sbuild

    From Helmut Grohne@21:1/5 to All on Tue Jun 25 10:20:01 2024
    Hi,

    sbuild is our primary tool for constructing a build environment to build
    Debian packages. It is used on all buildds and for a long time, the
    backend used with sbuild has always been schroot. More recently, a
    number of buildds have been moved away from schroot towards --chroot-mode=unshare thanks to the work of at least Aurelien Jarno and
    Jochen Sprickerhof and a few more working more behind the scenes for me
    to spot them directly.

    In this work, limitations with --chroot-mode=unshare became apparent and
    that lead to Johannes, Jochen and me sitting down in Berlin pondering
    ideas on how to improve the situation. That is a longer story, but
    eventually Timo Röhling asked the innocuous question of why we cannot
    just use schroot and make it work with namespaces.

    That lead me to sit down and write a proof of concept. As a result, we
    now have a little script called unschroot.py that vaguely can be used as
    a drop-in replacement for schroot when used with sbuild. In trixie and bookworm-backports it can now be plugged into sbuild by setting $schroot
    = "path/to/unschroot.py" thanks to Johannes. It's not that long and can
    be viewed at https://git.subdivi.de/~helmut/python-linuxnamespaces.git/tree/examples/unschroot.py.
    It is vaguely close to reaching feature-parity with sbuild --chroot-mode=unshare and operates in a very similar way. As it is now,
    it doesn't bring us any benefits beyond separating the containment
    aspect from the build aspect into different tools.

    The split into different tools is important in my view. I argue that it
    allows easier experimentation and its architecture may enable features
    that were difficult to implement using sbuild --chroot-mode=unshare as
    sbuild is significantly becoming a container runtime of its own and
    there things start to get messy.

    Is this a path worth pursuing further? Would we actually consider moving
    back from sbuild --chroot-mode=unshare to sbuild --chroot-mode=schroot
    with a different schroot implementation?

    Related to that, what would be compelling features to switch?

    Let me go a bit further into detail. There are two approaches to
    managing an ephemeral build container using namespaces. In one approach,
    we create a directory hierarchy of a container root filesystem and for
    each command and hook that we invoke there, we create new namespaces on
    demand. In particular, there are no background processes when nothing is running in that container and all that remains is its directory
    hierarchy. Such a container session can easily survive a reboot (unless
    stored on tmpfs). Both sbuild --chroot-mode=unshare and unschroot.py
    follow this approach. For comparison, schroot sets up mounts (e.g /proc)
    when it begins a session and cleans them up when it ends. No such
    persistent mounts exist in either sbuild --chroot-mode=unshare or
    unschroot.py.

    The other approach is using one set of namespaces for the entire
    session. Practically, this implies having a background process keeping
    this namespace alive for the duration of the session and talking to it
    via some IPC mechanism. We may still spawn a new pid namespace for each
    command to get reliable process cleanup, but the use of a persistent
    mount namespace enables the use of fuse2fs, squashfuse, overlayfs and
    bindfs to construct the root directory of the container by other means
    than unpacking a tar into a directory. In particular, the use of bindfs
    allows sharing e.g. the user's ccache with the build container in
    principle (with proper id shifting). At the time of this writing, this
    second approach is wishful thinking and not implemented at all. I merely believe that it is implementable with the schroot API already
    implemented by unschroot.py above.

    Another possible extension is a hooking mechanism. Regular schroot has
    hooks already and I've seen requests for sbuild to use package-specific chroots. For instance, one may have a separate Haskell or Rust container
    that already has a basic set of ecosystem-specific dependencies to speed
    up the installation of Build-Depends. On-demand updating chroots also
    have been requested. However, it's not clear to me what a useful
    interface e.g. unschroot.py could provide for such hooking yet and I
    invite you to provide more use cases for such hooking. Also sketching
    how you imagine interfacing with this would be helpful. For instance,
    you may explain what kind of configuration files or options you'd like
    to use and how you imagine them to work.

    I note that this is not a promise that I am going to implement your
    wishes. I intend to do more work on this and barring really useful
    extensions, my next goal would be moving to that other approach.

    Please allow me to thank Freexian for supporting part of this work
    financially even though it has been my initiative and is not otherwise influenced by Freexian at the time of this writing.

    Let me also explain the relation between "unschroot.py" and the
    containing repository "python-linuxnamespaces". linuxnamespaces is a
    (probably) distribution-agnostic Python module providing plumbing
    functions for constructing container runtimes written by myself for lack
    of better alternatives. As such unschroot.py in large parts uses linuxnamespaces (the Python module) to plug together the various parts
    needed to arrive at a container useful for building with sbuild. If
    unschroot takes off, it likely needs to get its own home.
    linuxnamespaces is supposed to enable constructing a systemd-as-pid-1
    container as a regular user, but doesn't do that as of yet. While podman
    and docker allow running unprivileged application containers, they still require privileged containers when you want to run systemd-as-pid-1.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johannes Schauer Marin Rodrigues@21:1/5 to All on Tue Jun 25 11:40:01 2024
    Hi,

    Quoting Helmut Grohne (2024-06-25 10:16:20)
    In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering ideas on how to improve the situation. That is a longer story, but eventually Timo Röhling
    asked the innocuous question of why we cannot just use schroot and make it work with namespaces.

    for those who are interested in the longer story behind all of this, let me explain this here. Maybe this background helps to give a bit more of a context about why we were thinking about this. If my memory serves me right, I think the initial trigger for all of this was an idea that Julian Andres Klode had: instead of having to manually run

    $ mmdebstrap --variant=buildd unstable ~/.cache/sbuild/unstable-arm64.tar

    manually before running sbuild for the first time and every once in a while after that, could the sbuild unshare backend not run this command automatically if the chroot tarball doesn't exist yet or became too old? If that were the default, setting up a package builder on a new system would be as simple as running:

    $ sudo apt install sbulid
    $ sbuild --chroot-mode=unshare -d unstable my_cool_package

    Just install the package, no setup required, and just start building things. So I wrote this MR:

    https://salsa.debian.org/debian/sbuild/-/merge_requests/59

    Besides automatically creating the chroot, updating the chroot and setting the maximum age of the chroot tarball, this MR also allows passing custom options to mmdebstrap depending on the chroot name. So somebody who wants an ubuntu buildd chroot could have in their ~/.sbuildrc:

    $UNSHARE_MMDEBSTRAP_EXTRA_ARGS = {
    "focal" => ["--components=universe,multiverse"]
    }

    And if you want a custom chroot for the rust packages you build, you could have (using %d and %a as percent escapes for distribution and architecture):

    $UNSHARE_MMDEBSTRAP_EXTRA_ARGS = {
    "debcargo-%d-%a" => [--include ccache,gnupg,dh-cargo,cargo,lintian,perl-openssl-defaults"]
    }

    But the big question now is: should all of this functionality and complexity live in sbuild? Or should it be moved out of sbuild so that for example the rust team can just have their custom sbuild chroot script doing all the setup and customization they require without sbuild carrying that functionality? The question of how to best allow sbuild to allow an external chroot manager lead to Timo's idea of just replacing 'schroot' with something else which provides the interface that sbuild uses to communicate with schroot but then in the back does its own thing. We have such a thing now and that is Helmut's unschroot.py.

    But this is not the only option forward. Another option would be to re-purpose the sbuild autopkgtest backend for this.

    Ultimately, what do we want to achieve? We want to make package building easier, less tedious and more customizable. We are thinking about what the best architecture would be to achieve this. We have multiple options:

    1. bolt the functionality we want into sbuild as extensions to the unshare
    backend or by creating new backends with the desired functionality -- this
    is https://salsa.debian.org/debian/sbuild/-/merge_requests/59
    2. replace the schroot binary with something else which shares the schroot
    interface -- this is unschroot.py
    3. move functionality into autopkgtest backends and then make sbuild just a
    wrapper around autopkgtest

    Choosing one of these three options as the correct software engineering approach becomes even more tricky when we start thinking of persistence. Providing a persistent user and mount namespace (for example to be able to use overlay filesystems on top of a unpacked chroot directory) will be *very* tricky with the current design of the unshare backend and would probably need to become its own backend, if choice 1. is the one we want to pursue. Persistence is also something that would be useful for a backend around qemu. Currently, there exists sbuild-qemu maintained by Christian Kastner which is not a new backend but is a convenience wrapper on top of sbuild driving it with the autopkgtest backend and autopkgtest-virt-qemu as the virt server. It would be great if building packages inside a qemu vm became easier and option 2 would allow users to create a backend that starts qemu and then communicates with a process inside qemu efficiently via an AF_VSOCK. Lastly, persistence is also a requirement for building packages inside a system container which runs an actual init process when it spins up.

    These last two bits (qemu and system containers) also become a very interesting topic to think about because in contrast to minimal application containers or a simple chroot directory, we do not want to build packages directly in them (because we want to build packages in a minimal setup). So now sbuild would have to manage 3 environments:

    1. the system on the outside
    2. the system inside qemu or the system container
    3. the system in the minimal build chroot inside the VM

    This is also why option 3 from above (autopkgtest) is not an obvious solution because shoving this understanding into autopkgtest will also be some effort. Why thinking about how sbuild needs to have an understanding how it can interact from one of these environments with the next inner environment is important can be shown by one very simple example: why do we have apt inside our build chroots? The main reason is: because the schroot interface (and that is a reason why option 2, unschroot.py is not the fits-all solution) does not provide a facility to let a tool from the outside (environment 1) work on inside the chroot (environment 3). The unshare backend has this ability. If sbuild only were using the unshare backend and we drop the schroot backend, I could easily allow build chroot containing nothing else than essential, build-essential and build dependencies.

    So, this was a big brain dump. Thank you for getting this far. I've had this in my head for several weeks now and even though I'm very exciting to see where Helmut's unschroot.py can take us, I do not yet see one way forward that I am entirely happy with.

    Maybe you can share your thoughts.

    Thanks!

    cheers, josch
    --==============w91154018912350913=MIME-Version: 1.0
    Content-Transfer-Encoding: 7bit
    Content-Description: signature
    Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmZ6jz4ACgkQ8sulx4+9 g+Go+g//ZiN9W5fvVeHbGEJWTHHg+ZVDfa0HVzgDaMDpKSiB8gLBDnhYxKaE5Xs+ ouD7q/HFr+SwLnbKC9CMz+M+j4jPZGgMXQoAj9LCNSG3YxJlWE840Zk9xrxYI/0v m0dd95momJAoxAErP9fGsrArxIDVZo9a28g5+evcdXFEc79fQO+r4IYuIZYAxWFL 7BTZsxEeY3CpI9AAvQhLF6+q/Um4hmlnDyWXLV3eucEQ2E9lsj7Ij5C9UoW7bT+k V/9iWzJ1+1rUYl3x5109tTrDJMh39zhtbg9A6mF0yhTP81pT+oFSuZatcWlzk9RU fuAroTcejgGzng5FIFmwZF3dL5rvKp0CUb75H0as8pLbluzeLcK3yOJ3jevHBIfE 5qt0rYG7rQRbSniAncWrOQhuD/dw5hGrj+mOqS+sc/03VRoQYsI/u/lfmMC0MeBN CZzirhvgCSbVnTf6bGPp9DIhf7j+cu5HK79scoU2sa/bY+W8NMKVfAh8bfVOF1yV 0x+RZKwnFTNQwAIVj692gyaHB4Kh0VQeWasnFaggIvvSq3hPz0w755bpyNfnrLPy 8igAJl5xfkPGriQ8WeUrwOXXcFiHLcQTl7vriTHKMwUTc11wznIaGgPIAMlY/o+0 62wuHLMkXURTRLpuzbmUj8tsOs6ITe645MP8k7GQtQNqt7ceHDc=
    =x1FO
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Tue Jun 25 15:10:01 2024
    On Tue, 25 Jun 2024 at 10:16:20 +0200, Helmut Grohne wrote:
    In this work, limitations with --chroot-mode=unshare became apparent and
    that lead to Johannes, Jochen and me sitting down in Berlin pondering
    ideas on how to improve the situation. That is a longer story, but
    eventually Timo Röhling asked the innocuous question of why we cannot
    just use schroot and make it work with namespaces.

    I have to ask:

    Could we use a container framework that is also used outside the Debian
    bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other
    important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    At the moment, rootless Podman would seem like the obvious choice. As far
    as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).

    Podman uses the same OCI images as Docker, so it can either pull from a
    trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
    Debian we would want to do the latter, at least initially, to avoid
    being forced to either trust an external registry like hub.docker.com
    or operate our own.

    Here's the Dockerfile/Containerfile to turn a sysroot tarball into an
    OCI image (obviously it can be extended with LABELs and other
    customizations, but this is fairly close to minimal):

    FROM scratch
    ADD sysroot.tar.gz /
    CMD ["/bin/bash"]

    The reason I suggest Podman rather than Docker is that Podman is normally "daemonless" (the container is an ordinary process tree, like schroot,
    rather than being launched by command-execution RPC to dockerd) and
    is normally used "rootless" (whereas Docker *can* be configured to be "rootless" but in practice it seems that's very uncommon).

    podman is also supported as a backend by autopkgtest-virt-podman, Toolbx (podman-toolbox in Debian) and distrobox. autopkgtest's autopkgtest-build-podman does not yet support starting from a tarball
    as described above, but it easily could (contributions welcome).

    Or, if Podman is too "not invented here" for Debian's use, using rootless lxd/Incus is another option - although that introduces a dependency
    on projects and formats that are rarely used outside the Debian/Ubuntu
    bubble, which risks them becoming another schroot (and also requires us to decide whether we follow Canonical's lxd or the community fork Incus
    post-fork, which could get somewhat political).

    There are two approaches to
    managing an ephemeral build container using namespaces. In one approach,
    we create a directory hierarchy of a container root filesystem and for
    each command and hook that we invoke there, we create new namespaces on demand. In particular, there are no background processes when nothing is running in that container and all that remains is its directory
    hierarchy. Such a container session can easily survive a reboot (unless stored on tmpfs). Both sbuild --chroot-mode=unshare and unschroot.py
    follow this approach. For comparison, schroot sets up mounts (e.g /proc)
    when it begins a session and cleans them up when it ends. No such
    persistent mounts exist in either sbuild --chroot-mode=unshare or unschroot.py.

    Persisting a container root filesystem between multiple operations comes
    with some serious correctness issues if there are "hooks" that can modify
    it destructively on each operation: see <https://bugs.debian.org/499014>
    and <https://bugs.debian.org/994836>. As a result of that, I think the
    only model that should be used in new systems is to have some concept of
    a session (like schroot type=file, but unlike schroot type=directory)
    so that those "hooks" only run once, on session creation, preventing
    them from arbitrarily reverting/overwriting changes that are subsequently
    made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).

    I don't know whether creating new namespaces multiple times (but without running external integration hooks the second and subsequent times)
    will also lead to practical problems, but I note that outside the Debian bubble, everything that enters a new container environment seems to
    operate by creating a process that encapsulates the container, and then
    either letting it run to completion interactively or non-interactively
    (`docker run`, etc.), or letting it run in the background (perhaps with
    an init system or `sleep infinity` as its "payload" process) and then repeatedly injecting code into that pre-existing namespace
    (either `docker exec`, etc., or something like ssh).

    autopkgtest's Docker, Podman, lxc, lxd backends all operate by creating
    a namespaced init or sleep process with `docker run` or equivalent, and
    then injecting subsequent commands into the namespace that was created
    for that long-running process with `docker exec` or equivalent.
    I think unshare is the outlier here, and I think it would be good to
    consider whether it really needs to be.

    The more like other container managers a new container manager is, the
    less likely it is to break reasonable expectations in future, like
    schroot regularly does.

    While podman
    and docker allow running unprivileged application containers, they still require privileged containers when you want to run systemd-as-pid-1.

    What do you mean by "privileged containers" exactly? Do you mean a system service that runs with CAP_SYS_ADMIN and other scary privileges in the
    init namespace, like the typical use of dockerd, or are you also counting
    uses of the setuid newuidmap as being privileged?

    If you are happy to use the setuid newuidmap (which I believe the unshare backends for schroot, mmdebstrap, autopkgtest also rely on) then my understanding is that "rootless" podman is essentially equivalent:
    you need a setuid newuidmap, a range of 65536 uids in /etc/subuid,
    a range of 65536 gids in /etc/subgid, and a kernel that will allow
    unprivileged users to create new user namespaces, but beyond that there
    are no special privileges required.

    Please see /usr/share/doc/podman/README.Debian for details of what it needs.

    For systemd-as-pid-1 specifically,
    `autopkgtest-build-podman --init=systemd` and
    `autopkgtest-virt-podman --init` demonstrate how this can be done, and
    last time I tried, it was possible to run them unprivileged (other than
    needing access to the setuid newuidmap, as above). systemd is able to
    detect that it's running in a container and turn off functionality like
    udev that would only be appropriate in a VM or on bare metal, and podman
    knows how to tell systemd that it should do this.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Faidon Liambotis@21:1/5 to Simon McVittie on Tue Jun 25 15:30:01 2024
    On Tue, Jun 25, 2024 at 02:02:11PM +0100, Simon McVittie wrote:
    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    Absolutely agreed, strong +1 on this.

    At the moment, rootless Podman would seem like the obvious choice. As far
    as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).

    I am perhaps a little biased, but I, too, think rootless Podman would be
    the best for the job :)

    Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
    Debian we would want to do the latter, at least initially, to avoid
    being forced to either trust an external registry like hub.docker.com
    or operate our own.

    Here's the Dockerfile/Containerfile to turn a sysroot tarball into an
    OCI image (obviously it can be extended with LABELs and other
    customizations, but this is fairly close to minimal):

    Note that podman run also has --rootfs, that accepts the path to an
    exploded container, and it supports both idmap and overlayfs on top of
    it as well. So that's another option, one that skips image management, Dockerfiles etc. entirely, allowing for an even closer experience to the existing tooling.

    Faidon

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to Simon McVittie on Tue Jun 25 15:30:01 2024
    On Tue, Jun 25, 2024 at 02:02:11PM +0100, Simon McVittie wrote:
    In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering
    ideas on how to improve the situation. That is a longer story, but eventually Timo Röhling asked the innocuous question of why we cannot
    just use schroot and make it work with namespaces.

    I have to ask:

    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    100%


    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmZ6xhUtFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh EVMQAIyMAh7oxG41KevYXQeKINCvXHCsZtqGspMRVVP8KbBZ1LRSxHqrl95fq6PI ozH8M6rRN8tBM+/EsozqHqi8sdbkDy+kRo0z13nj1Lh9lehDQRmuqVY8DTKbuKay lPzNfeJ7Vh5IHhtSjZDM2Sgf+UppEo51/Q35v77bz8ZCTiGCe25uKMHBIOHwdZYJ ERz+AyR+yJvfRzyw1TR3ycSyQ/rAE+rA5jtNquw9r3/r1DHauZDU+zFvw+C1GTXH xZrACVtj91DEkuzDgnH+hcgv5BmJhfryg8oLvjdBYuZcvqLG5NHP5oConBP3arIp l23Mt7a/2ioqlNs92CUfd1xwk2BGOMnjWDDty1U1kN2MSYoqiQrjbmMsICUezxuj wB7ttWBPC7biMj5HTet3hg2HR6fOU8TO+w7V+mLbdv0xZCd+eUdBuCazb2EapfI3 YKrMrc6M0Mt64OG4iXRKXHeWEvDssJcQ/tzD9PkNyljfOT0VVQY/T0ElOL7ZjzQ+ PM0TLfMDv1SyQ866RfCCBkQSO6+LCkQAjmH/drjERcuKpDzqB7bOBwd/B/bm9YQd KxsLFEiZAN8tmIEDpru0QzcVZv3kkQdCrbk1RFz65pQKy0Nx3Domba5SRj+woDJZ d5YMAi4hqTxX0uLLNG2FhGwL+jjpI+V71NPXBBzo1CilrtwL
    =Oy2N
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Antonio Terceiro@21:1/5 to Simon McVittie on Tue Jun 25 17:30:01 2024
    On Tue, Jun 25, 2024 at 02:02:11PM +0100, Simon McVittie wrote:
    On Tue, 25 Jun 2024 at 10:16:20 +0200, Helmut Grohne wrote:
    In this work, limitations with --chroot-mode=unshare became apparent and that lead to Johannes, Jochen and me sitting down in Berlin pondering
    ideas on how to improve the situation. That is a longer story, but eventually Timo Röhling asked the innocuous question of why we cannot
    just use schroot and make it work with namespaces.

    I have to ask:

    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    At the moment, rootless Podman would seem like the obvious choice. As far
    as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).

    Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
    Debian we would want to do the latter, at least initially, to avoid
    being forced to either trust an external registry like hub.docker.com
    or operate our own.

    Yes, please.

    FWIW, I want to switch ci.d.n from lxc to podman at some point as well.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEst7mYDbECCn80PEM/A2xu81GC94FAmZ64cgACgkQ/A2xu81G C97+ig//XbwVcY7F+97DPtPRvU//vINo07xTuyFKwvFA7fa0CUaQaYi+zPahLEEu 36lC0eUkjCFhb1hJh/lHH5G/5MrgvcWy7tOWNEJKtbRp2gb7fnFM3hzgu8W3Nwlr q8mL6FWvM+80Mo3Wsuiqf6gYRzZAzzY6TXLPcutBKw7sdjMPdpo9kfBLtljqsxsx I3qO3Ok4ihg+0BPR5bca9gXTzgX83JA7ARi3USUa9z+vU1Lh+8R3q6KIDdSf1oL0 Odji5Fur5C/n1lsjmLwPGCb6E1GDXZVPCN8K4OESHIm1N5AZODxtr+twceu5G+fm JWubSDNFzLnyD3b9eRgKvkW+EQiiNKFei6IKk2Lrpqxd981XDdDfcA+aPSx/xkNc GcjVtVbiYPQwcJqn2bBfLEZeeFO6YNon0aP75gVBKfsbfP+FrT2tG+4vZP2xvOVF J5qMikNA9+Ftz7tfRBMw/NXuJ6C4HR9rxNtr5XmsLNuVBCeX/CVdMiSzuCmQc/gk MhnSrcbX389NsiVZLopUh/i+T8wOAI9EtT1RC7nYrj/bNifMGJ0Z/At6Pt+ze5v6 OI2GkgRr5F5T/SlcjcQV2mENEs23RQV9dsAzpNqBs6E1hCyMo9GAwUmGNbbvhG+U Vst0sCGwnllOkRHgbfa1pStzsD8jcwkXZaFbYpKK0/g9+7SGrjc=
    =HtjC
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Simon McVittie on Tue Jun 25 18:40:01 2024
    Simon McVittie <smcv@debian.org> writes:

    Persisting a container root filesystem between multiple operations comes
    with some serious correctness issues if there are "hooks" that can modify
    it destructively on each operation: see <https://bugs.debian.org/499014>
    and <https://bugs.debian.org/994836>. As a result of that, I think the
    only model that should be used in new systems is to have some concept of
    a session (like schroot type=file, but unlike schroot type=directory)
    so that those "hooks" only run once, on session creation, preventing
    them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).

    I'm not entirely sure that I'm following the nuances of this discussion,
    so this may be irrelevant, but I think type=btrfs-snapshot provides the
    ideal properties for container file systems. This unfortunately require
    file system support and therefore cannot be used unless you've already
    embraced a file system with subvolumes, but if you have, you get all of
    the speed of a persistent container root file system with none of the correctness issues, because you get a fresh (and almost instant) clone of
    a canonical root file system that is discarded after each build.

    I use that in combination with a cron job to update the source subvolume
    daily to ensure that it's fully patched.

    Unfortunately, there's no way that we can rely on this, but it would be
    nice to continue to support it for those who are using a supported
    underlying file system already.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Guillem Jover@21:1/5 to Russ Allbery on Tue Jun 25 18:50:01 2024
    Hi!

    On Tue, 2024-06-25 at 09:32:21 -0700, Russ Allbery wrote:
    Simon McVittie <smcv@debian.org> writes:
    Persisting a container root filesystem between multiple operations comes with some serious correctness issues if there are "hooks" that can modify it destructively on each operation: see <https://bugs.debian.org/499014> and <https://bugs.debian.org/994836>. As a result of that, I think the
    only model that should be used in new systems is to have some concept of
    a session (like schroot type=file, but unlike schroot type=directory)
    so that those "hooks" only run once, on session creation, preventing
    them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).

    I'm not entirely sure that I'm following the nuances of this discussion,
    so this may be irrelevant, but I think type=btrfs-snapshot provides the
    ideal properties for container file systems. This unfortunately require
    file system support and therefore cannot be used unless you've already embraced a file system with subvolumes, but if you have, you get all of
    the speed of a persistent container root file system with none of the correctness issues, because you get a fresh (and almost instant) clone of
    a canonical root file system that is discarded after each build.

    I use that in combination with a cron job to update the source subvolume daily to ensure that it's fully patched.

    Unfortunately, there's no way that we can rely on this, but it would be
    nice to continue to support it for those who are using a supported
    underlying file system already.

    I manage my chroots with schroot (but not via sbuild, for dog fooding
    purposes :), and use type=directory and union-type=overlay so that I
    get a fast and persistent base, independent of the underlying filesystem,
    with fresh instances per session. (You can access the base via the
    source:<id> names.) I never liked the type=file stuff, as it's slow to
    setup and maintain.

    Regards,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Russ Allbery on Tue Jun 25 19:00:01 2024
    On Tue, 25 Jun 2024 at 09:32:21 -0700, Russ Allbery wrote:
    Simon McVittie <smcv@debian.org> writes:
    I think the
    only model that should be used in new systems is to have some concept of
    a session (like schroot type=file, but unlike schroot type=directory)

    I'm not entirely sure that I'm following the nuances of this discussion,
    so this may be irrelevant, but I think type=btrfs-snapshot provides the
    ideal properties for container file systems.

    That's another of the "good" schroot types which don't generally cause bugs like #499014 and #994836. As of Debian 12, I believe the situation is:

    Good (session-based): file, btrfs-snapshot, zfs-snapshot, lvm-snapshot

    Bad by default, can be good if combined with a non-trivial union-type: directory, loopback, block-device

    Usually a mistake: plain

    I mentioned file because it's the only one of the "good" choices that can
    works on any system, without a specific filesystem or storage management mechanism, but the others are fine too if you happen to have the right filesystem or storage management. If you have enough RAM, the file
    backend unpacked into a tmpfs also completely avoids any possible
    performance issue involving fsync(), whether in dpkg or elsewhere :-)

    I would also suggest not using the "source chroot" associated with one
    of the good (session-based) options, and instead re-bootstrapping the
    chroot from first principles whenever that's desired.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Simon McVittie on Tue Jun 25 19:00:02 2024
    Hi Simon,

    On Tue, Jun 25, 2024 at 02:02:11PM +0100, Simon McVittie wrote:
    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    This is a reasonable concern. I contend that while unschroot.py is very Debian-specific, the underlying plumbing layer is not. I would not have
    started working on this if what I wanted to do was doable with existing
    code, but maybe it was not the code didn't do it, but me not using the
    existing code correctly.

    Please allow me to point out that right now, sbuild contains a custom
    container framework that is subject to eventually becoming a starving single-maintainer project and I am trying to extract and separate this
    existing container framework from sbuild into more reusable components. Likewise, mmdebstrap contains another custom container framework that is similar but not equal to the one in sbuild.

    At the moment, rootless Podman would seem like the obvious choice. As far
    as I'm aware, it has the same user namespaces requirements as the unshare backends in mmdebstrap, autopkgtest and schroot (user namespaces enabled, setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in /etc/subgid).

    I concur, the privilege requirements for rootless podman are exactly the
    ones I am interested in. Indeed, podman was the thing investigated most thoroughly, but evidently not thoroughly enough.

    Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
    Debian we would want to do the latter, at least initially, to avoid
    being forced to either trust an external registry like hub.docker.com
    or operate our own.

    At least for me, building container images locally is a requirement. I
    have no interest in using a container registry. Faidon pointing at
    --roofs goes further into this direction.

    podman is also supported as a backend by autopkgtest-virt-podman, Toolbx (podman-toolbox in Debian) and distrobox. autopkgtest's autopkgtest-build-podman does not yet support starting from a tarball
    as described above, but it easily could (contributions welcome).

    Thank you for pointing at these. I need to familiarize myself with them.

    Or, if Podman is too "not invented here" for Debian's use, using rootless lxd/Incus is another option - although that introduces a dependency
    on projects and formats that are rarely used outside the Debian/Ubuntu bubble, which risks them becoming another schroot (and also requires us to decide whether we follow Canonical's lxd or the community fork Incus post-fork, which could get somewhat political).

    lxd/incus also was on my list, but my understanding is that they do not
    work without their system services at all and being able to operate
    containers (i.e. being incus-admin or the like) roughly becomes
    equivalent to being full root on the system defeating the purpose of the exercise. If anything is "not invented here", that'd be unschroot rather
    than podman.

    There are two approaches to
    managing an ephemeral build container using namespaces. In one approach,
    we create a directory hierarchy of a container root filesystem and for
    each command and hook that we invoke there, we create new namespaces on demand. In particular, there are no background processes when nothing is running in that container and all that remains is its directory
    hierarchy. Such a container session can easily survive a reboot (unless stored on tmpfs). Both sbuild --chroot-mode=unshare and unschroot.py
    follow this approach. For comparison, schroot sets up mounts (e.g /proc) when it begins a session and cleans them up when it ends. No such persistent mounts exist in either sbuild --chroot-mode=unshare or unschroot.py.

    Persisting a container root filesystem between multiple operations comes
    with some serious correctness issues if there are "hooks" that can modify
    it destructively on each operation: see <https://bugs.debian.org/499014>
    and <https://bugs.debian.org/994836>. As a result of that, I think the
    only model that should be used in new systems is to have some concept of
    a session (like schroot type=file, but unlike schroot type=directory)
    so that those "hooks" only run once, on session creation, preventing
    them from arbitrarily reverting/overwriting changes that are subsequently made by packages installed into the chroot/container (for example dbus' creation of the messagebus uid/gid in #499014, and exim4's creation of Debian-exim in #994836).

    I guess you understood my explanation differently than it was meant.
    While the container is persisted into the filesystem, this is being done
    for each package build individually. sbuild --chroot-mode=unshare and
    unschroot use a tarball as their source and opening the session amounts
    to extracting it. At the end of the session, the tree is disposed. The
    session concept of schroot is being reused in unschroot and it very much behaves like a type=file chroot except that you can begin a session,
    reboot and continue using it until you end it without requiring a system service to recover your sessions during boot.

    The main difference to how everyone else does this is that in a typical
    sbuild interaction it will create a new user namespace for every single
    command run as part of the session. sbuild issues tens of commands
    before launching dpkg-buildpackage and each of them creates new
    namespaces in the Linux kernel (all of them using the same uid mappings, performing the same bind mounts and so on). The most common way to think
    of containers is different: You create those namespaces once and reuse
    the same namespace kernel objects for multiple commands part of the same session (e.g. installation of build dependencies and dpkg-buildpackage).
    You describe this other approach in more detail:

    I don't know whether creating new namespaces multiple times (but without running external integration hooks the second and subsequent times)
    will also lead to practical problems, but I note that outside the Debian bubble, everything that enters a new container environment seems to
    operate by creating a process that encapsulates the container, and then either letting it run to completion interactively or non-interactively (`docker run`, etc.), or letting it run in the background (perhaps with
    an init system or `sleep infinity` as its "payload" process) and then repeatedly injecting code into that pre-existing namespace
    (either `docker exec`, etc., or something like ssh).

    Exactly, this is how everyone but sbuild --chroot-mode=unshare and
    unschroot do it.

    autopkgtest's Docker, Podman, lxc, lxd backends all operate by creating
    a namespaced init or sleep process with `docker run` or equivalent, and
    then injecting subsequent commands into the namespace that was created
    for that long-running process with `docker exec` or equivalent.

    Please allow me to do a tangential excursion here. There two ways of interacting with containers that use one set of namespaces for their
    entire existence. One is setting up some IPC mechanism and receiving
    commands to be run inside (for instance spawning a shell and piping
    commands into it or driving the container via ssh) or an external
    process joins (setns) the existing container (namespaces) and injects
    code into it (docker exec). That latter approach has a history of vulnerabilities closely related to vulnerabilities in setuid binaries,
    because we are transitioning a process (and all of its context) from
    outside the container into it and thus expose all of its context (memory
    maps, open file descriptors and so on) to contained processes. As such,
    I think that an approach based on an IPC mechanism should be preferred.
    I am not sure whether podman exec operates in this way, but a quick
    codesearch did not exhibit obvious uses of setns inside the podman
    source code. Would anyone be able to tell how podman exec is
    implemented here?

    I think unshare is the outlier here, and I think it would be good to
    consider whether it really needs to be.

    Absolutely! Did you observe that I suggested moving unschroot to that
    other model where the namespace objects are reused for the entire
    session? Indeed, moving sbuild --chroot-mode=unshare in this direction
    was one of the primary motivations for starting this work, but doing
    this inside sbuild is very difficult due to its architecture, so my
    approach was first separating the container framework from sbuild and
    that's how I arrived at unschroot.

    The more like other container managers a new container manager is, the
    less likely it is to break reasonable expectations in future, like
    schroot regularly does.

    Yes! I very much used the systemd container interface documentation to
    avoid exactly this problem.

    While podman
    and docker allow running unprivileged application containers, they still require privileged containers when you want to run systemd-as-pid-1.

    What do you mean by "privileged containers" exactly? Do you mean a system service that runs with CAP_SYS_ADMIN and other scary privileges in the
    init namespace, like the typical use of dockerd, or are you also counting uses of the setuid newuidmap as being privileged?

    I'm sorry for being imprecise here. Privileged is an overloaded term in
    the container context. I was trying to use it with the "not rootless"
    meaning here. The interest is in running containers with user privileges available on common installations (i.e. unprivileged user namespaces,
    newuidmap being setuid, subuid allocation and systemd being your cgroup
    manager and handing out delegated cgroups).

    If you are happy to use the setuid newuidmap (which I believe the unshare backends for schroot, mmdebstrap, autopkgtest also rely on) then my understanding is that "rootless" podman is essentially equivalent:
    you need a setuid newuidmap, a range of 65536 uids in /etc/subuid,
    a range of 65536 gids in /etc/subgid, and a kernel that will allow unprivileged users to create new user namespaces, but beyond that there
    are no special privileges required.

    Cool. I think you really need one more non-trivial (but very commonly available) privilege. You need a cgroup manager (such as systemd) that
    allows creating and delegating a cgroup hierarchy to you. You may call
    this a non-special privilege.

    Please see /usr/share/doc/podman/README.Debian for details of what it needs.

    It could use updating as swapaccount=1 is the default.

    For systemd-as-pid-1 specifically,
    `autopkgtest-build-podman --init=systemd` and
    `autopkgtest-virt-podman --init` demonstrate how this can be done, and
    last time I tried, it was possible to run them unprivileged (other than needing access to the setuid newuidmap, as above). systemd is able to
    detect that it's running in a container and turn off functionality like
    udev that would only be appropriate in a VM or on bare metal, and podman knows how to tell systemd that it should do this.

    This is very cool. Running autopkgtests in system containers without
    being root (or incus-admin) very much is what I'd like to do. And it's
    much better if I don't have to write my own container framework for
    doing it. I couldn't get it to work locally yet (facing non-obvious
    error messages).

    Would someone be able to document (mail/wiki/blog/...) how to set up and
    use podman for running autopkgtests. Thus far, I failed to figure out
    how to plug a local Debian mirror (as opposed to a container registry)
    into autopkgtest-build-podman. It is quite difficult to locate podman documentation that is applicable under the assumption that you don't
    want to use any container registry.

    So thank you very much for pointing me hard at podman again. My podman
    research dates back quite a bit and I can already tell that podman is
    quite a bit different now.

    Let me circle back to the question of whether podman solves the needs of sbuild. We learned that sbuild --chroot-mode=unshare and unschroot spawn
    a new set of namespaces for every command. What you point out as a
    limitation also is a feature. Technically, it is a lie that the
    namespaces are always constructed in the same way. During installation
    of build depends the network namespace is not unshared while package
    builds commonly use an unshared network namespace with no interfaces but
    the loopback interface. In a similar vein, constructing a pid namespace
    for every command ensures reliable process cleanup: Once your build has
    exited, all background processes are reliably disposed. These aspects
    are very useful to how we use containers in sbuild, but the way most
    container runtimes work with a single set of namespaces makes this
    non-trivial. We really want to change the set of namespaces throughout
    the session.

    So I think the needs of sbuild (and piuparts) about container frameworks
    are quite specific and not easily met by existing tools. Ultimately,
    this is what lead me into writing a reusable Python module providing
    container plumbing and a relatively thin implementation of schroot using namespaces on top of it.

    If we can get the requested features from podman, choosing it is the
    better choice to me for the maintainability reasons that you started
    with. It is not clear though whether podman can be made to address our requirements.

    Thank you for having taken one step back and questioning my context
    instead of going into my actual questions.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to Guillem Jover on Tue Jun 25 19:10:01 2024
    On Jun 25, Guillem Jover <guillem@debian.org> wrote:

    I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I
    get a fast and persistent base, independent of the underlying filesystem, with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to
    setup and maintain.
    Same. So I implemented overlayfs support in pbuilder:

    https://salsa.debian.org/pbuilder-team/pbuilder/-/merge_requests/28

    If a tmpfs is mounted on /var/cache/pbuilder/build/ then all the actual
    action will happen in RAM.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZnr4EAAKCRDLPsM64d7X gaCAAP4ttAZ2Zyp3h2VMJoyoEM3pbnhbP0V7n6V1abkqTQGc9QD+LJ2CmJ6n3fe0 jeq2w1+lYzJyU7Tup3CJJA+LTSZrVAs=
    =Ss9c
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to Russ Allbery on Tue Jun 25 19:40:01 2024
    On Tue, Jun 25, 2024 at 10:24:12AM -0700, Russ Allbery wrote:
    Guillem Jover <guillem@debian.org> writes:

    I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I get
    a fast and persistent base, independent of the underlying filesystem,
    with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to setup and maintain.

    Ah, thank you, I didn't realize that existed. That sounds like a nice generalization of the file system snapshot approach.

    (Unless I'm missing something it's the default setup for e.g. sbuild-createchroot(8))

    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmZ7ABwtFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh qLQP/1Kmi1XcUL04PVQszKhQD9BpcU44JLKOH72dcIuBfPEd0tuzT7jEjyHfTg0e wGLzRThdPogbSlKRJe+G/oR7Q4Hwaau22w8oq/TeoGhnEUD/ANUPindPxdo/LlO4 LQaBDtVHpY7MaObaowGGL+Qo6gguyoZqBf4qBVf3v3M5vbEcRY2v6JlVSUUC2yEd oIO/s3UuyQrYnGwqo4K2thBUd+V+lZLQqvs1Pu7hvDIpJU+BjYjHjDrEAdVb7w+R EPj2O9YGxluMLI0CVCPMnAYfiuFtMM9mDyiVCtm81ayfj/joshTUJuolvybKi5GL RDOZqIponlkgwWGYEywslkpr8qpcuSuUOd94nVSqMNfbjzkyfpSPo26wMYQUnhQu DsPsxv5z6zJjMuSakG7+xdo9pxpSG++WS2ikmoI8w/cLkZXPc16kWGfo0c2aTI8H 2LUvXV77nnOdkjkjHeA2yN/4m6cpiZJQ74qAen3IXWRDP4RFuTyx7i8naVPwhfg1 Kp7vKjb44iNDmRp3C+DzploLwzhkRdvLqV4efLJGn83nMFp63d62vmPo4diV+jo6 5EJJE51UmIEAopjTVgGiZGuSUO0KUSUSQORfPUFa2sAvorSW8dl3VpmYh2Ihx0qp C++utWLdyMXFJDMqXjUkdSh1+DT9s09wgscSVmvQPWbI0HRR
    =MXKP
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Guillem Jover on Tue Jun 25 19:30:01 2024
    Guillem Jover <guillem@debian.org> writes:

    I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I get
    a fast and persistent base, independent of the underlying filesystem,
    with fresh instances per session. (You can access the base via the source:<id> names.) I never liked the type=file stuff, as it's slow to
    setup and maintain.

    Ah, thank you, I didn't realize that existed. That sounds like a nice generalization of the file system snapshot approach.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From PICCA Frederic-Emmanuel@21:1/5 to All on Tue Jun 25 19:50:01 2024
    Ah, thank you, I didn't realize that existed. That sounds like a nice generalization of the file system snapshot approach.

    I think that this how the

    sbuild-debian-developer-setup

    script, setup chroots

    Fred

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to All on Tue Jun 25 19:50:01 2024
    PICCA Frederic-Emmanuel <frederic-emmanuel.picca@synchrotron-soleil.fr>
    writes:

    Ah, thank you, I didn't realize that existed. That sounds like a nice
    generalization of the file system snapshot approach.

    I think that this how the

    sbuild-debian-developer-setup

    script, setup chroots

    Yeah, I think all that my contribution to this thread accomplished was to demonstrate that I set up sbuild years ago based on a wiki article for
    btrfs and don't know what I'm talking about. :) Apologies for that.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Helmut Grohne on Tue Jun 25 20:20:01 2024
    On 25/06/24 18:55, Helmut Grohne wrote:
    For systemd-as-pid-1 specifically,
    `autopkgtest-build-podman --init=systemd` and
    `autopkgtest-virt-podman --init` demonstrate how this can be done, and
    last time I tried, it was possible to run them unprivileged (other than
    needing access to the setuid newuidmap, as above). systemd is able to
    detect that it's running in a container and turn off functionality like
    udev that would only be appropriate in a VM or on bare metal, and podman
    knows how to tell systemd that it should do this.

    This is very cool. Running autopkgtests in system containers without
    being root (or incus-admin) very much is what I'd like to do. And it's
    much better if I don't have to write my own container framework for
    doing it. I couldn't get it to work locally yet (facing non-obvious
    error messages).

    Would someone be able to document (mail/wiki/blog/...) how to set up and
    use podman for running autopkgtests.

    I'd like to take this chance to suggest, instead of writing more
    documentation, changing the autopkgtest packaging so that it is split
    into various per-backend packages, each of which provides a ready-to-go pre-configured environment. See <https://bugs.debian.org/1039958#22>.

    Currently, in order to get a working autopkgtest + podman setup, one has to:

    1) install autopkgtest
    2) install podman
    3) install a non-clearly-defined set of additional packages (including, surprisingly, dbus-user-session)
    4) change various configuration files
    5) learn how to use autopkgtest-build-podman
    5a) BONUS: realize that, instead, you'd like use mmdebstrap to create
    the base images, but mmdebstrap-autopkgtest-build-podman does not exit.
    6) learn how to properly invoke autopkgtest $dir -- podman

    It would be great if the user experience on a freshly installed system
    were instead more like:

    $ apt install autopkgtest-podman
    $ autopkgtest $dir
    $ # done

    I believe achieving this right now is just a matter of better packaging.
    (Plus some improvements to deal with the few packages whose test have
    non ordinary and taxing requirements.)

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Gevers@21:1/5 to All on Tue Jun 25 22:00:01 2024
    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------fb7kzJVZqQ8fY09e8GD8HYva
    Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64

    SGkNCg0KT24gMjUtMDYtMjAyNCA2OjU1IHAubS4sIEhlbG11dCBHcm9obmUgd3JvdGU6DQo+ IFRoaXMgaXMgdmVyeSBjb29sLiBSdW5uaW5nIGF1dG9wa2d0ZXN0cyBpbiBzeXN0ZW0gY29u dGFpbmVycyB3aXRob3V0DQo+IGJlaW5nIHJvb3QgKG9yIGluY3VzLWFkbWluKSB2ZXJ5IG11 Y2ggaXMgd2hhdCBJJ2QgbGlrZSB0byBkby4gQW5kIGl0J3MNCj4gbXVjaCBiZXR0ZXIgaWYg SSBkb24ndCBoYXZlIHRvIHdyaXRlIG15IG93biBjb250YWluZXIgZnJhbWV3b3JrIGZvcg0K PiBkb2luZyBpdC4gSSBjb3VsZG4ndCBnZXQgaXQgdG8gd29yayBsb2NhbGx5IHlldCAoZmFj aW5nIG5vbi1vYnZpb3VzDQo+IGVycm9yIG1lc3NhZ2VzKS4NCg0KTWF5YmUgYnVnICMxMDU5 NzI1Pw0KDQpQYXVsDQo=

    --------------fb7kzJVZqQ8fY09e8GD8HYva--

    -----BEGIN PGP SIGNATURE-----

    wsB5BAABCAAjFiEEWLZtSHNr6TsFLeZynFyZ6wW9dQoFAmZ7H6UFAwAAAAAACgkQnFyZ6wW9dQol 0QgAleYsHj5QDAYOFvOBk38ufWctUTcIcQYDeSSunMbN3UinZNNtef/8ZJ07mv7dGRwHHvhEBqq7 y+KwRsgtOIjvvLqT/mhW5W3040+yEsQsiPgy4SYQH3Fd6/FajY9Byt6ArY3Myj+tYUBgxAfV78w9 WfHak9DjfHafcStoOFB06J17w6DEV0VeETmqGg8cY8gJfMWq/LDdTl4WVIQSN0+g79+0bvT4nZkw 3XWkSR1T9o3bNsLlLJGlO3DzMl5tzVOXpwcFxXge3Hwg1XiuoUoEL6TlMkbGJST8vh9nshM8EU65 Owji5GNsvGw6yj8Gm6PBagdvUB+7oUb5b2omYgNteg==
    =bmI2
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Holger Levsen@21:1/5 to Simon McVittie on Wed Jun 26 11:20:01 2024
    hi,

    On Tue, Jun 25, 2024 at 02:02:11PM +0100, Simon McVittie wrote:
    I have to ask:

    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? [...]

    +1

    Podman uses the same OCI images as Docker, so it can either pull from a trusted OCI registry, or use images that were built by importing a tarball generated by e.g. mmdebstrap or sbuild-createchroot. I assume that for
    Debian we would want to do the latter, at least initially, to avoid
    being forced to either trust an external registry like hub.docker.com
    or operate our own.

    I'd just like to mention the less known fact, that https://docker.debian.net/ provides reproducible images for nine Debian architectures today...


    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾â â¢ â ’⠀⣿⡠holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    🔥 - this is fine.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmZ72uQACgkQCRq4Vgaa qhzpkQ//T9iH9u6gqtvS4z3hMCW4uZjMJFOkA2Iq6v9PUPWbMaAe/sKWWaR+ShuK iRtZ5BmZ2mu/JEvQY1blJ9vrgB555I1smyUlpXHaUXIY4hKlgMdpo8SN/HDG5CfH iKw9TztqBXUT3O5fQnXGEpbcqbfOW2NgR2drgb8Hlz1uDNfxviIGW8i7CO3+yG/h bdwA6DVE/T1axpSpV2CdCmSU5ikAerCsBOPVnOIbCJA4+sF+GoAAVb5d8JU0RTmt tFWtCswhCZL7iesVcXy9fFvYv5VJNfBq1ipLYkQg7URLkbKEXHkI0CihfVbjUUh/ 4OYil8PlaOk1KOdgl4QPmLq3P1NSUNgGp83oR2AGfcXnozDeOzNdryp17sU6r1Lp ph7H0z8qSWjz8uKyfa99aGKyz6L6+I+3s0stHgMAj0POAoQAI7B4mDW67yqbcaQM 3af/EtBCCRYlDk1oUIsn5qiM4qw5Mgd1xyF9wodhRd9U803vNJzZyQiD8RnRoHpF C32LDeh7tPs6+gV8rJl13WA8YPjfUWY89bcHcM6M735VJ8TaZAwlbJwNuymZDPWc F4e8TsRFLRd39foyIE2RAjEEs+UG1Y9ZLjf1ufTBSesVsVj4DRzgQd875KCBPvop ZpcBNMeMUTrEn5HcCtdjXWldxe
  • From Simon McVittie@21:1/5 to Guillem Jover on Wed Jun 26 18:10:02 2024
    On Tue, 25 Jun 2024 at 18:47:49 +0200, Guillem Jover wrote:
    I manage my chroots with schroot (but not via sbuild, for dog fooding purposes :), and use type=directory and union-type=overlay so that I
    get a fast and persistent base, independent of the underlying filesystem, with fresh instances per session.

    type=directory *with a union-type* is OK, and avoids the persistence
    issues I mentioned: it has many of the same properties as type=file
    (but different performance characteristics).

    type=directory *without* a union-type can trigger bugs like the ones
    I mentioned.

    You can access the base via the source:<id> names

    This is the same as with type=file. If you do this, be careful to avoid installing software that creates/relies on new uids existing inside
    the chroot, such as dbus or exim4, if a corresponding username does not
    already exist outside the chroot. That's what causes bugs like the ones
    I mentioned.

    I would recommend usually re-bootstrapping the base instead of modifying
    it in-place, to avoid having differences between a freshly-bootstrapped
    base and the current state of your base chroot building up over time
    (for example packages that are removed from the transitively Essential set remaining installed in your base chroot indefinitely, or non-dpkg-managed configuration files being different for new installations and upgraded
    older installations), which can result in a harder-to-reproduce build environment.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefano Rivera@21:1/5 to All on Wed Jun 26 18:40:01 2024
    Hi Helmut (2024.06.25_16:55:45_+0000)
    lxd/incus also was on my list,

    Personally, I have been using LXD (and now Incus, as it made it into
    Debian, yay) for my experimentation and local package builds, for a
    number of years now. They have native support for btrfs snapshots,
    locally built images, and make it relatively simple to block network
    access for my builds. The autopkgtest-virt backed is a bit klunky, but I
    don't miss schroot at all.

    but my understanding is that they do not work without their system
    services at all

    Correct. LXC containers are essentially VMs without their own kernel.
    They run their own systemd. This does mean that I build packages in a
    fatter system than necessary. But that has yet to be an issue for me.

    and being able to operate containers (i.e. being incus-admin or the
    like) roughly becomes equivalent to being full root on the system
    defeating the purpose of the exercise.

    You don't have to be incus-admin to use Incus. Users get their own incus project (see the incus-user.service). But I've never played with this
    much, on a single-user system, incus-admin is just much simpler (if less secure).

    Of course incus still has to be root itself to add network interfaces to bridges. It's nice to be able to control networking for the containers,
    but it would be even nicer for sbuild to not need setup that requires
    root.

    Stefano

    --
    Stefano Rivera
    http://tumbleweed.org.za/
    +1 415 683 3272

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Wed Jun 26 19:20:01 2024
    On Tue, 25 Jun 2024 at 18:55:45 +0200, Helmut Grohne wrote:
    At least for me, building container images locally is a requirement. I
    have no interest in using a container registry.

    I expected you'd say that. podman --rootfs is one way to use it without
    a registry; a trivially short Dockerfile like the one I mentioned,
    to convert a tarball into a container image locally, is another.
    (Debian's pseudo-official Docker images on Dockerhub use the latter.)

    But I think it would be great if some part of Debian - perhaps the
    cloud team? - could periodically publish genuinely official minbase
    sysroot tarballs and/or OCI images from Debian infrastructure, like the
    cloud team already does for VM images, which would avoid relying on a third-party registry while also avoiding requiring every developer to
    spend thought and CPU time on building their own before they can start
    on their actual development.

    lxd/incus also was on my list, but my understanding is that they do not
    work without their system services at all and being able to operate containers (i.e. being incus-admin or the like) roughly becomes
    equivalent to being full root on the system defeating the purpose of the exercise.

    Perhaps, I haven't looked into lxd/incus in detail (podman seems to have
    the properties I wanted so I stopped there). I might have been misled by
    the fact that lxd can run rootless containers - but maybe it can only
    do that by making IPC requests to a privileged service, a bit like the
    way snapd operates.

    I guess you understood my explanation differently than it was meant.
    While the container is persisted into the filesystem, this is being done
    for each package build individually. sbuild --chroot-mode=unshare and unschroot use a tarball as their source and opening the session amounts
    to extracting it. At the end of the session, the tree is disposed. The session concept of schroot is being reused in unschroot and it very much behaves like a type=file chroot except that you can begin a session,
    reboot and continue using it until you end it without requiring a system service to recover your sessions during boot.

    OK, good: this is "the same shape" as schroot type=file, which is not
    one of the modes that has the problems I described. If you're carrying
    over the underlying on-disk directory across reboots, you'll have to
    be a little careful about persisting state into that directory (only
    things that will still be true after a reboot can safely be stored),
    but I'm sure you're doing that.

    The main difference to how everyone else does this is that in a typical sbuild interaction it will create a new user namespace for every single command run as part of the session. sbuild issues tens of commands
    before launching dpkg-buildpackage and each of them creates new
    namespaces in the Linux kernel (all of them using the same uid mappings, performing the same bind mounts and so on). The most common way to think
    of containers is different: You create those namespaces once and reuse
    the same namespace kernel objects for multiple commands part of the same session (e.g. installation of build dependencies and dpkg-buildpackage).

    Yes. My concern here is that there might be non-obvious reasons why
    everyone else is doing this the other way, which could lead to behavioural differences between unschroot and all the others that will come back to
    bite us later.

    There two ways of
    interacting with containers that use one set of namespaces for their
    entire existence. One is setting up some IPC mechanism and receiving
    commands to be run inside (for instance spawning a shell and piping
    commands into it or driving the container via ssh) or an external
    process joins (setns) the existing container (namespaces) and injects
    code into it (docker exec). That latter approach has a history of vulnerabilities closely related to vulnerabilities in setuid binaries, because we are transitioning a process (and all of its context) from
    outside the container into it and thus expose all of its context (memory maps, open file descriptors and so on) to contained processes. As such,
    I think that an approach based on an IPC mechanism should be preferred.

    An IPC-based approach is certainly going to provide better security
    hardening (especially if setuid helpers are used), and potentially better functionality as well.

    In Flatpak (which uses namespaces too, but is not really the same sort
    of container), the debugging command `flatpak enter` currently uses the
    setns approach (which comes with various limitations), and one of the
    items on my infinite to-do list is to make that be IPC-based instead,
    possibly by reusing code written for steam-runtime-tools during $dayjob.

    For whole-system containers running an OS image from init upwards,
    or for virtual machines, using ssh as the IPC mechanism seems
    pragmatic. Recent versions of systemd can even be given a ssh public
    key via the systemd.system-credentials(7) mechanism (e.g. on the kernel
    command line) to set it up to be accepted for root logins, which avoids
    needing to do this setup in cloud-init, autopkgtest's setup-testbed,
    or similar.

    For "application" containers like the ones you would presumably want
    to be using for sbuild, presumably something non-ssh is desirable.

    I am not sure whether podman exec operates in this way, but a quick codesearch did not exhibit obvious uses of setns inside the podman
    source code. Would anyone be able to tell how podman exec is
    implemented here?

    I don't know the answer to this.

    I think you really need one more non-trivial (but very commonly
    available) privilege. You need a cgroup manager (such as systemd) that
    allows creating and delegating a cgroup hierarchy to you.

    Quite possibly, yes. I don't think I ever tried running
    autopkgtest-virt-podman --init on a system that didn't have
    systemd-as-pid-1 and a working `systemd --user`.

    Would someone be able to document (mail/wiki/blog/...) how to set up and
    use podman for running autopkgtests. Thus far, I failed to figure out
    how to plug a local Debian mirror (as opposed to a container registry)
    into autopkgtest-build-podman. It is quite difficult to locate podman documentation that is applicable under the assumption that you don't
    want to use any container registry.

    If you build an image by importing a tarball that you have built in
    whatever way you prefer, minimally something like this:

    $ cat > Dockerfile <<EOF
    FROM scratch
    ADD minbase.tar.gz /
    EOF
    $ podman build -f Dockerfile -t local-debian:sid .

    then you should be able to use localhost/local-debian:sid
    as a substitute for debian:sid in the examples given in autopkgtest-virt-podman(1), either using it as-is for testing:

    $ autopkgtest -U hello*.dsc -- podman localhost/local-debian:sid

    or making an image that has been pre-prepared with some essentials like dpkg-source, and testing in that:

    $ autopkgtest-build-podman --image localhost/local-debian:sid
    ...
    Successfully tagged localhost/autopkgtest/localhost/local-debian:sid
    $ autopkgtest hello*.dsc -- podman autopkgtest/localhost/local-debian:sid
    (tests run)

    Adding a mode for "start from this pre-prepared minbase tarball" to all
    of the autopkgtest-build-* tools (so that they don't all need to know
    how to run debootstrap/mmdebstrap from first principles, and then duplicate
    the necessary options to make it do the right thing), has been on my
    to-do list for literally years. Maybe one day I will get there.

    We could certainly also benefit from some syntactic sugar to make the
    automatic choice of an image name for localhost/* podman images nicer,
    with fewer repetitions of localhost/.

    By default (as per /etc/containers/registries.conf.d/shortnames.conf),
    podman considers debian:sid to be short for docker.io/library/debian,
    which is the closest thing we have to "official" Debian OCI images. If we
    had our own self-hosted container registry with suitable scalability and security, like Red Hat and SUSE do, that file could point there instead.
    Salsa does in fact provide us with a self-hosted container registry,
    but probably not one that is sufficiently scalable?

    podman is unlikely to provide you with a way to generate a minbase
    tarball without first creating or downloading some sort of container
    image in which you can run debootstrap or mmdebstrap, because you have
    to be able to start from somewhere. But you can run mmdebstrap unprivileged
    in unshare mode, so that's enough to get you that starting point.

    We learned that sbuild --chroot-mode=unshare and unschroot spawn
    a new set of namespaces for every command. What you point out as a
    limitation also is a feature. Technically, it is a lie that the
    namespaces are always constructed in the same way. During installation
    of build depends the network namespace is not unshared while package
    builds commonly use an unshared network namespace with no interfaces but
    the loopback interface.

    I don't think podman can do this within a single run. It might be feasible
    to do the setup (installing build-dependencies) with networking enabled;
    leave the root filesystem of that container intact; and reuse it as the
    root filesystem of the container in which the actual build runs, this time
    with --network=none?

    Or the "install build-dependencies" step (and other setup) could perhaps
    even be represented as a `podman build` (with a Dockerfile/Containerfile,
    FROM the image you had as your starting point), outputting a temporary container image, in which the actual dpkg-buildpackage step can be invoked
    by `podman run --network=none --rmi`?

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Venthur@21:1/5 to Simon McVittie on Thu Jun 27 11:00:02 2024
    On 25.06.24 15:02, Simon McVittie wrote:
    I have to ask:

    Could we use a container framework that is also used outside the Debian bubble, rather than writing our own from first principles every time, and ending up with a single-maintainer project being load-bearing for Debian *again*? I had hoped that after sbuild's history with schroot becoming unmaintained, and then being revived by a maintainer-of-last-resort who
    is one of the same few people who are critical-path for various other important things, we would recognise that as an anti-pattern that we
    should avoid if we can.

    Great proposal!

    Here's the Dockerfile/Containerfile to turn a sysroot tarball into an
    OCI image (obviously it can be extended with LABELs and other
    customizations, but this is fairly close to minimal):

    FROM scratch
    ADD sysroot.tar.gz /
    CMD ["/bin/bash"]

    I had the idea to build my Debian packages in a clean docker container
    instead of using cowbuilder etc for some time now. But due to lack of
    time and complexity of available solutions never got really far. Do you
    happen to have a minimal example that would work for most projects and
    does not depend too much on opinionated Debian specific tooling?


    Cheers!

    Bastian

    --
    Dr. Bastian Venthur https://venthur.de
    Debian Developer venthur at debian org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Simon McVittie on Thu Jun 27 13:50:01 2024
    Hi Simon,

    Thanks for having taken the time to do another extensive writeup. Much appreciated.

    On Wed, Jun 26, 2024 at 06:11:09PM +0100, Simon McVittie wrote:
    On Tue, 25 Jun 2024 at 18:55:45 +0200, Helmut Grohne wrote:
    The main difference to how everyone else does this is that in a typical sbuild interaction it will create a new user namespace for every single command run as part of the session. sbuild issues tens of commands
    before launching dpkg-buildpackage and each of them creates new
    namespaces in the Linux kernel (all of them using the same uid mappings, performing the same bind mounts and so on). The most common way to think
    of containers is different: You create those namespaces once and reuse
    the same namespace kernel objects for multiple commands part of the same session (e.g. installation of build dependencies and dpkg-buildpackage).

    Yes. My concern here is that there might be non-obvious reasons why
    everyone else is doing this the other way, which could lead to behavioural differences between unschroot and all the others that will come back to
    bite us later.

    I do not share this concern (but other concerns of yours). The risk of behavioural differences is fairly low, because we do not expect any non-filesystem state to transition from one command to the next. Much to
    the contrary, the use of a pid namespace for each command ensures
    reliable process cleanup, so no background processes can accidentally
    stick around.

    I am concerned about behavioural differences due to the reimplementation
    from first principles aspect though. Jochen and Aurelien will know more
    here, but I think we had a fair number of ftbfs due to such differences.
    None of them was due to the architecture of creating a namespaces for
    each command and most of them were due to not having gotten right
    containers in general. Some were broken packages such as skipping tests
    when detecting schroot.

    Also note that just because I do not share your concern here does not
    imply that I'd be favouring sticking to that architecture. I expressed elsewhere that I see benefits in changing it for other reasons. At this
    point I more and more see this as a non-boolean question. There is a
    spectrum between "create namespaces once and use them for the entire
    session" and "create new namespaces for each command" and more and more
    I start to believe that what would be best for sbuild is somewhere in
    between.

    For whole-system containers running an OS image from init upwards,
    or for virtual machines, using ssh as the IPC mechanism seems
    pragmatic. Recent versions of systemd can even be given a ssh public
    key via the systemd.system-credentials(7) mechanism (e.g. on the kernel command line) to set it up to be accepted for root logins, which avoids needing to do this setup in cloud-init, autopkgtest's setup-testbed,
    or similar.

    Another excursion: systemd goes beyond this and also provides the ssh
    port via an AF_VSOCK (in case of VMs) or a unix domain socket on the
    outside (in case of containers) to make safe discovery of the ssh access easier.

    For "application" containers like the ones you would presumably want
    to be using for sbuild, presumably something non-ssh is desirable.

    I partially concur, but this goes into the larger story I hinted at in
    my initial mail. If we move beyond containers and look into building
    inside a VM (e.g. sbuild-qemu) we are in a difficult spot, because we
    need e.g. systemd for booting, but we may not want it in our build
    environment. So long term, I think sbuild will have to differentiate
    between three contexts:
    * The system it is being run on
    * The containment or virtualisation environment used to perform the
    build
    * The system where the build is being performed inside the containment
    or virtualisation environment

    At present, sbuild does not distinguish the latter two and always treats
    them equal. When building inside a VM, we may eventually want to create
    a chroot inside the VM to arrive at a minimal environment. The same
    technique is applicable to system containers. When doing this, we
    minimize the build environment and do not mind the extra ssh dependency
    in the container or virtualisation environment. For now though, this is
    all wishful thinking. As long as this distinction does not exist, we
    pretty much want minimal application containers for building as you
    said.

    If you build an image by importing a tarball that you have built in
    whatever way you prefer, minimally something like this:

    $ cat > Dockerfile <<EOF
    FROM scratch
    ADD minbase.tar.gz /
    EOF
    $ podman build -f Dockerfile -t local-debian:sid .

    I don't quite understand the need for a Dockerfile here. I suspect that
    this is the obvious way that works reliably, but my impression was that
    using podman import would be easier. I had success with this:

    mmdebstrap --format=tar --variant=apt unstable - | podman import --change CMD=/bin/bash - local-debian/sid

    then you should be able to use localhost/local-debian:sid
    as a substitute for debian:sid in the examples given in autopkgtest-virt-podman(1), either using it as-is for testing:

    $ autopkgtest -U hello*.dsc -- podman localhost/local-debian:sid

    This did not work for me. autopkgtest failed to create a user account. I suspect that this has one of two reasons. Either autopkgtest expects
    python3 to be installed and it isn't or it expects passwd to be
    installed and doesn't install it when missing (as passwd is
    non-essential).

    or making an image that has been pre-prepared with some essentials like dpkg-source, and testing in that:

    $ autopkgtest-build-podman --image localhost/local-debian:sid
    ...
    Successfully tagged localhost/autopkgtest/localhost/local-debian:sid

    Works for me.

    $ autopkgtest hello*.dsc -- podman autopkgtest/localhost/local-debian:sid
    (tests run)

    Thank you very much. I got this working for application container based testing, which provides a significant speedup compared to virt-qemu.

    I am more interested in providing isolation-container though as a number
    of tests require that and I currently tend to resort to virt-qemu for
    that. Sure enough, adding --init=systemd to autopkgtest-build-podman
    just works and a system container can also be used as an application
    container by autopkgtest (so there is no need to build both), but
    running the autopkgtest-virt-qemu --init also fails here in non-obvious
    ways. It appears that user creation was successful, but the user
    creation script is still printed in red.

    We're now deep into debugging specific problems in the
    autopkgtest/podman integration and this is probably getting off-topic
    for d-devel. Is the evidence thus far sufficient for turning this part
    of the discussion into a bug report against autopkgtest?

    Adding a mode for "start from this pre-prepared minbase tarball" to all
    of the autopkgtest-build-* tools (so that they don't all need to know
    how to run debootstrap/mmdebstrap from first principles, and then duplicate the necessary options to make it do the right thing), has been on my
    to-do list for literally years. Maybe one day I will get there.

    From my point of view, this isn't actually necessary. I expect that many people would be fine drawing images from a container registry. Those
    stubborn people like me will happily go the extra mile.

    We could certainly also benefit from some syntactic sugar to make the automatic choice of an image name for localhost/* podman images nicer,
    with fewer repetitions of localhost/.

    Let me pose a possibly stupid suggestion. Much of the time when people
    interact with autopkgtest, there is a very limited set of backends and
    backend options people use frequently. Rather than making the options
    shorter, how about introducing an aliasing mechanism? Say I could have
    some ~/.config/autopkgtest.conf and whenever I run autopkgtest ... --
    $BACKEND such that there is no autopkgtest-virt-$BACKEND, consult that configuration file and if there the value is assigned, expand it the
    assigned value. Then, I can just record my commonly used backends and
    options there and refer to them by memorable names of my own liking.
    Automatic choice of images makes things more magic, which bears negative aspects as well.

    podman is unlikely to provide you with a way to generate a minbase
    tarball without first creating or downloading some sort of container
    image in which you can run debootstrap or mmdebstrap, because you have
    to be able to start from somewhere. But you can run mmdebstrap unprivileged in unshare mode, so that's enough to get you that starting point.

    I consider this part of the problem space fully solved.

    Please allow for another podman question (and more people than Simon
    know the answer). Every time I run a podman container (e.g. when I run autopkgtest) my ~/.local/share/containers grows. I think autopkgtest
    manages to clean up in the end, but e.g. podman run -it ... seems to
    leave stuff behind. Such a growing directory is problematic for multiple reasons, but I was also hoping that podman would be using fuse-overlayfs
    + tmpfs to run my containers instead of writing tons of stuff to my slow
    disk. I hoped --image-volume=tmpfs could improve this, but it did not.
    Of course, when I skip podman's image management and use --rootfs, I can
    side step this problem by choosing my root location on a tmpfs, but
    that's not how autopkgtest uses podman.

    We learned that sbuild --chroot-mode=unshare and unschroot spawn
    a new set of namespaces for every command. What you point out as a limitation also is a feature. Technically, it is a lie that the
    namespaces are always constructed in the same way. During installation
    of build depends the network namespace is not unshared while package
    builds commonly use an unshared network namespace with no interfaces but the loopback interface.

    I don't think podman can do this within a single run. It might be feasible
    to do the setup (installing build-dependencies) with networking enabled; leave the root filesystem of that container intact; and reuse it as the
    root filesystem of the container in which the actual build runs, this time with --network=none?

    Do I understand correctly that in this variant, you intend to use podman without its image management capabilities and rather just use --rootfs
    spawning two podman containers on the same --rootfs (one after another)
    where the first one installs dependencies and the second one isolates
    the network for building?

    Or the "install build-dependencies" step (and other setup) could perhaps
    even be represented as a `podman build` (with a Dockerfile/Containerfile, FROM the image you had as your starting point), outputting a temporary container image, in which the actual dpkg-buildpackage step can be invoked
    by `podman run --network=none --rmi`?

    In this case, we build a complete container image for the purpose of
    building a package. This has interesting consequences. For one thing, we
    often build the same package twice, so caching such an image for some
    time is an obvious feature to look into.

    If you go that way, you may as well use mmdebstrap to construct
    containers with precisely your relevant build-dependencies on demand
    (for every build). The mmdebstrap ... | podman import ... rune would
    roughly work for that.

    Let me try to go one step back here. The podman model (and that of many
    other runtimes) is that one session equates one set of namespaces, but
    network isolation requires another set of namespaces. Your two
    approaches cleverly side-step this, by doing two containers on the same directory hierarchy or on-demand construction of containers (in one
    namespace) and running them (in other namespaces).

    These approaches come with limitations. The first approach requires
    reinventing podman's image management and doing that by hand. In
    particular, that prohibits us from using overlays as a means to avoid extraction or doing the extraction on-demand via e.g. squashfs. In an
    ideal world, I think we do want one user and mount namespace for the
    entire session and then do pid and network namespaces per-command
    as-needed. The second approach requires writing the container to disk
    very much degrading build performance. If we want to enable these use
    cases, then I fear podman is not the tool of choice as its featureset
    does not match these (idealized) requirements. In other words, settling
    on podman limits us in what we features we can implement in sbuild, but
    it may still allow more features than the status quo, so it still can be
    an incremental improvement of the status quo. The question kinda becomes whether it is reasonable to skip that podman step and head over to an architecture that enables more of our use cases.

    And then the question becomes whether unschroot is that better
    architecture or not and whether trading the risk of maintenance issues
    that you correctly identified is worth the additional features that we
    expect from it.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Helmut Grohne on Thu Jun 27 16:00:01 2024
    On Thu, 27 Jun 2024 at 11:46:51 +0200, Helmut Grohne wrote:
    I am concerned about behavioural differences due to the reimplementation
    from first principles aspect though. Jochen and Aurelien will know more
    here, but I think we had a fair number of ftbfs due to such differences.
    None of them was due to the architecture of creating a namespaces for
    each command and most of them were due to not having gotten right
    containers in general. Some were broken packages such as skipping tests
    when detecting schroot.

    Right - this is an instance of the more general problem pattern, "if we
    don't test a thing regularly, we can't assume it works". We routinely
    test sbuild+schroot (on the buildds), and individual developers often
    try builds without any particular isolation (on development systems or expendable test systems), but until recently sbuild's unshare backend
    was not something that would be routinely tested with most packages,
    and similarly most packages are not routinely built with Podman or Docker
    or whatever else.

    In packages that, themselves, want to do things with containers during
    their build or testing (for example bubblewrap and flatpak), there will typically be a code path for "no particular isolation" that actually
    runs the tests (otherwise upstream would not find the tests useful), and
    a code path for sbuild+schroot that skips the tests (otherwise they'd
    fail on our historical buildds), but the detection that we are in a
    locked-down environment where some tests need to be skipped might not
    be 100% correct. I know I've had to adjust flatpak's test suite several
    times to account for things like detecting whether FUSE works (because
    on DSA'd machines it intentionally doesn't, as a security hardening step).

    If we move beyond containers and look into building
    inside a VM (e.g. sbuild-qemu) we are in a difficult spot, because we
    need e.g. systemd for booting, but we may not want it in our build environment. So long term, I think sbuild will have to differentiate
    between three contexts:
    * The system it is being run on
    * The containment or virtualisation environment used to perform the
    build
    * The system where the build is being performed inside the containment
    or virtualisation environment

    Somewhat prior art for this: https://salsa.debian.org/smcv/vectis uses
    a VM (typically running Debian stable), installs sbuild + schroot into it,
    and uses sbuild + schroot for the actual build, in an attempt to replicate
    the setup of the production buildds on developer machines. In this case
    sbuild is in the middle layer instead of the top layer, though.

    Similarly, when asked to test packages under lxc (in an attempt to
    replicate the setup of ci.debian.net), vectis installs lxc into a VM,
    and runs autopkgtest on the VM rather than on the host system.

    Of course, I'd prefer it if Debian's production infrastructure was
    something that would be easier to replicate "closely enough" on my
    development system (such that packages that pass tests on my development
    system are very likely to pass tests on the production infra), without
    damaging my development system if I use it to build a malicious,
    compromised or accidentally-low-quality package that creates side-effects outside the build environment.

    I don't quite understand the need for a Dockerfile here. I suspect that
    this is the obvious way that works reliably, but my impression was that
    using podman import would be easier.

    Honestly, the need for a Dockerfile here is: I already knew how to build containers from a Dockerfile, and I didn't read the documentation for
    the lower-level `podman import` because `podman build` can already do
    what I needed.

    I see this as the same design principle as why we encourage package
    maintainers to use dh, even when building trivial "toy" packages like
    hello, and in preference to implementing debian/rules at a lower level
    in trivial cases. To build a non-trivial container with multiple layers,
    you'll likely need a Dockerfile (or docker-compose, or some similar thing) *anyway*, so a typical user expectation will be to have a Dockerfile, and anyone building a container will likely already have learned the basics
    of how to write one; and then we might as well follow the same procedure
    in the trivial case, rather than having the trivial case be different and require different knowledge.

    $ autopkgtest -U hello*.dsc -- podman localhost/local-debian:sid

    This did not work for me. autopkgtest failed to create a user account.

    Please report a bug against autopkgtest with steps to reproduce. It worked
    for me, on Debian 12 with a local git checkout of autopkgtest, and it's probably something that ought to work - although it's always going to be non-optimal, because it will waste a bunch of time doing basic setup like installing dpkg-dev and configuring the apt proxy before every test. The
    reason why we have autopkgtest-build-podman is to do that setup fewer
    times, cache the result, and amortize its cost across multiple runs.

    I am more interested in providing isolation-container though as a number
    of tests require that and I currently tend to resort to virt-qemu for
    that. Sure enough, adding --init=systemd to autopkgtest-build-podman
    just works and a system container can also be used as an application container by autopkgtest (so there is no need to build both), but
    running the autopkgtest-virt-qemu --init also fails here in non-obvious
    ways. It appears that user creation was successful, but the user
    creation script is still printed in red.

    (I assume you mean a-v-podman --init rather than a-v-qemu --init.
    a-v-qemu always needs an init system.)

    Please report a (separate) bug against autopkgtest with steps to reproduce. Unfortunately I haven't had been able to spend as much time on autopkgtest
    in recent months as I would like to, and I haven't done much with podman
    system containers (with init) since the -docker/-podman backend was
    originally merged.

    I remember that at one point, shortly before the -docker/-podman backend
    was merged, I did have a-v-podman --init working successfully on a system
    with systemd as pid 1 on the host, and each of the three init systems
    known to a-b-podman in the container: systemd, sysvinit with sysv-rc, or sysvinit with openrc (only tested extremely briefly). At the time, I think
    I was able to test src:dbus successfully with at least the first two.

    When testing my own packages, I usually have to prioritize -lxc because
    it's de facto RC (ci.debian.net uses it when not configured otherwise),
    and -qemu because it's the only way some of my packages can have good
    test coverage (notably bubblewrap and flatpak, which want to create new
    user namespaces during testing in a way that a container manager like
    podman will not usually allow).

    Of course in an ideal world I should be re-running the test suite for
    each package in each of the potentially interesting autopkgtest-virt-
    backends, but that would only give me fractionally better test coverage,
    in exchange for making it take even longer to release a package. I am
    sorry for not having been optimally thorough, but one bug that affects
    many of my package uploads, which (unusually!) cannot be solved by adding
    extra QA steps, is "this update took an unacceptably long time to reach
    the archive".

    If ci.debian.net moves away from -lxc, resulting in "tests pass under
    lxc" no longer being a de facto requirement for inclusion in testing,
    then I would prefer to be using -podman for all of the simpler tests
    (for example flatpak's debian/tests/build, which just exercises the -dev package), because it has a much, much shorter lead time for per-test
    setup than -qemu, while also having a useful level of isolation and
    being straightforward to replicate on a developer system for interactive debugging.

    Less-isolated backends like -schroot seem like a bad place to invest
    time and effort because they have more intrusive system and privilege requirements, while not actually being significantly faster or more
    capable.

    Let me pose a possibly stupid suggestion. Much of the time when people interact with autopkgtest, there is a very limited set of backends and backend options people use frequently. Rather than making the options shorter, how about introducing an aliasing mechanism? Say I could have
    some ~/.config/autopkgtest.conf and whenever I run autopkgtest ... -- $BACKEND such that there is no autopkgtest-virt-$BACKEND, consult that configuration file and if there the value is assigned, expand it the
    assigned value. Then, I can just record my commonly used backends and
    options there and refer to them by memorable names of my own liking.

    That sounds like a reasonable feature request, please open a bug. As
    with most reasonable feature requests in projects I maintain, it'll go
    on my list, but please don't assume that I will ever get sufficiently
    far through the list within my lifetime if left to implement it myself.

    A crude way to implement this would be to add something like this
    to $PATH:

    #!/bin/sh
    # Save as ~/bin/autopkgtest-virt-sid and make it executable
    set -eu
    exec autopkgtest-virt-podman "$@" localhost/autopkgtest/debian:sid

    and then use e.g. `autopkgtest ... -- sid`.

    (But please note that some backends have more than one place where you
    might wish to add arbitrary options, e.g. a-v-podman accepts a-v-podman options, followed by exactly one image, followed by "--" and arbitrary
    `podman run` options. It might be better if there was an --image parameter
    that can appear first as an alternative to the positional parameter.)

    Automatic choice of images makes things more magic, which bears negative aspects as well.

    The automatic choice of images is intended to be a matter of "have
    reasonable defaults" rather than anything deeper. For example in the
    example in the man page, if you tell autopkgtest-build-podman to convert debian:sid into a pre-prepared test container image, it'll default
    to outputting autopkgtest/debian:sid because that seems a little more
    friendly than forcing the user to choose their own arbitrary name, and establishing a convention via defaults makes it easier to write examples.

    (Or if you use --init=systemd to create a bootable system-container,
    you'll get autopkgtest/systemd/debian:sid, and so on.)

    Every time I run a podman container (e.g. when I run
    autopkgtest) my ~/.local/share/containers grows. I think autopkgtest
    manages to clean up in the end, but e.g. podman run -it ... seems to
    leave stuff behind.

    If you are using e.g. `podman run -it debian:sid` then that is expected
    to leave the container's root filesystem hanging around for future use
    or inspection, even after all of its processes have exited. This is
    vaguely analogous to using `schroot --begin-session` followed by
    `schroot --run-session`, and then leaving the session open indefinitely.

    If you want resources used by the container to be cleaned up automatically
    on exit, use the `--rm` option, more like `podman run --rm -it debian:sid`. This is more like `schroot --automatic-session`.

    `podman container list -a` will list all the containers that have been
    kept around in this way, and `podman container rm` or
    `podman container prune` will delete them. This is analogous to
    `schroot --end-session`.

    Of course, when I skip podman's image management and use --rootfs, I can
    side step this problem by choosing my root location on a tmpfs, but
    that's not how autopkgtest uses podman.

    That seems like a reasonable a-v-podman feature request too. Presumably
    it would only allow this when invoked as a-v-podman, and not when invoked
    as a-v-docker (I don't think a-v-docker has an equivalent feature).

    I don't think podman can do this within a single run. It might be feasible to do the setup (installing build-dependencies) with networking enabled; leave the root filesystem of that container intact; and reuse it as the root filesystem of the container in which the actual build runs, this time with --network=none?

    Do I understand correctly that in this variant, you intend to use podman without its image management capabilities and rather just use --rootfs spawning two podman containers on the same --rootfs (one after another)
    where the first one installs dependencies and the second one isolates
    the network for building?

    Maybe that; or maybe use its image management, tell the first podman command not to delete the container's root filesystem (don't use --rm), and then there's probably a way to tell podman to reuse the resulting filesystem
    with an additional layer in its overlayfs for the network-isolated run.

    Please note that I am far from being an expert on podman or the
    "containers" family of libraries that it is based on, and I don't
    know everything it is capable of. Because Debian has a lot of pieces
    of infrastructure we have built for ourselves from first principles,
    I've had to spend time on understanding the finer points of sbuild,
    schroot, lxc and so on, so that I can replicate failure modes seen on
    the buildds and therefore fix release-critical bugs in the packages that
    I've taken responsibility for (and occasionally also try to improve the infrastructure itself, for example #856877 which recently passed its
    7th birthday). That comes with an opportunity cost: the time I spent
    learning about schroot is time that I didn't spend learning about OCI.

    One of the reasons I would like to have fewer Debian-specific pieces in
    our stack is so that other Debian developers don't have to do what I
    did, and can instead spend their time gaining transferrable knowledge
    that will be equally useful inside and outside the Debian bubble (for
    example the best ways to use OCI images, and OCI-based tools like
    Docker and Podman, which have a lot of overlap in how they are used
    even though they are rather different behind the scenes).

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johannes Schauer Marin Rodrigues@21:1/5 to All on Thu Jun 27 17:30:01 2024
    Hi,

    Quoting Simon McVittie (2024-06-27 15:59:01)
    On Thu, 27 Jun 2024 at 11:46:51 +0200, Helmut Grohne wrote:
    I don't quite understand the need for a Dockerfile here. I suspect that this is the obvious way that works reliably, but my impression was that using podman import would be easier.

    Honestly, the need for a Dockerfile here is: I already knew how to build containers from a Dockerfile, and I didn't read the documentation for
    the lower-level `podman import` because `podman build` can already do
    what I needed.

    I see this as the same design principle as why we encourage package maintainers to use dh, even when building trivial "toy" packages like
    hello, and in preference to implementing debian/rules at a lower level
    in trivial cases. To build a non-trivial container with multiple layers, you'll likely need a Dockerfile (or docker-compose, or some similar thing) *anyway*, so a typical user expectation will be to have a Dockerfile, and anyone building a container will likely already have learned the basics
    of how to write one; and then we might as well follow the same procedure
    in the trivial case, rather than having the trivial case be different and require different knowledge.

    I have never in my life written a Dockerfile and so far I've only used "podman import" instead. Your explanation makes sense to me. I had no idea that "podman build" is on a higher plumbing level. As a container noob it was always more easy for me to write:

    mmdebstrap [my customizations] unstable | podman import - debian

    If I understand what you are saying, then what should instead be done is to write a Dockerfile receiving a vanilla tarball and then do the customizations via the Dockerfile?

    Can a Dockerfile be read from stdin? It's a small wrinkle to me that I would then need to create a private temporary directory with a Dockerfile first instead of just shoving it in over a pipe.

    Do I understand correctly that in this variant, you intend to use podman without its image management capabilities and rather just use --rootfs spawning two podman containers on the same --rootfs (one after another) where the first one installs dependencies and the second one isolates the network for building?

    Maybe that; or maybe use its image management, tell the first podman command not to delete the container's root filesystem (don't use --rm), and then there's probably a way to tell podman to reuse the resulting filesystem
    with an additional layer in its overlayfs for the network-isolated run.

    Please note that I am far from being an expert on podman or the
    "containers" family of libraries that it is based on, and I don't
    know everything it is capable of. Because Debian has a lot of pieces
    of infrastructure we have built for ourselves from first principles,
    I've had to spend time on understanding the finer points of sbuild,
    schroot, lxc and so on, so that I can replicate failure modes seen on
    the buildds and therefore fix release-critical bugs in the packages that
    I've taken responsibility for (and occasionally also try to improve the infrastructure itself, for example #856877 which recently passed its
    7th birthday). That comes with an opportunity cost: the time I spent
    learning about schroot is time that I didn't spend learning about OCI.

    One of the reasons I would like to have fewer Debian-specific pieces in
    our stack is so that other Debian developers don't have to do what I
    did, and can instead spend their time gaining transferrable knowledge
    that will be equally useful inside and outside the Debian bubble (for
    example the best ways to use OCI images, and OCI-based tools like
    Docker and Podman, which have a lot of overlap in how they are used even though they are rather different behind the scenes).

    Thank you for this text as well as the one in your initial email in which you caution against more Debian-isms with only very few maintainer(s) maintaining them. As the author of the unshare backend I am guilty of having added another Debian-specific thing instead of re-using existing solutions. Maybe my defense can be that when I wrote that code in 2018, there was no podman in Debian yet? I am not attached to the unshare code. I gladly throw it out for something better. The less code I have to maintain the better for me. I do not dislike podman either and I am happy that in contrast to docker, there is no persistent service running in the background.

    What I wanted to mainly bring up in this email are the following things:

    Creating build chroots from things that are signed with the Debian archive keyring is important to me. Even though, as Holger pointed out, the Debian images that one can download can be reproduced independently, I rather make sure that I receive what I think I receive by relying on creating my chroot via mmdebstrap/apt verified by my local keyring. Maybe in the future debian.org can publish build chroots signed by the archive keyring at which point I may change my position on this. But until then, I really heavily prefer to download the GPG signed stuff from our mirrors instead of something from an image registry that we do not control.

    If we change things around, I'd prefer whatever change is done comes with non-negligible advantages and few (preferably none) regressions. The unshare backend is theoretically (not in practice because the schroot backend cannot do it) able to give you an environment where you have all your namespaces unshared but you are on the outside of the chroot directory. This gives processes on the outside the opportunity to work on those on the inside (because they have the required privileges). One enticing application I see for this feature is to be able to build inside chroots that do not even have apt installed. This can be possible because apt can install build dependencies in the chroot from the outside, if given the chroot directory and the necessary privileges. As far as I'm aware (and please correct me if I'm wrong) podman does not offer this functionality? So with a podman backend, my build chroot has to include apt because the build dependencies have to be installed somehow? One solution/workaround to this problem would be something else you said earlier: create the container as part of build dependency installation. In such a scenario, the podman container could be created by running

    mmdebstrap [options installing b-d] unstable | podman import -

    And then the resulting container would have essential and the build dependencies installed but would not have apt in it. This would not work with a Dockerfile, right?

    Last point: people on this list were very excited about using an established container technology in our tooling instead of cooking something up ourselves and rightly so. I agree with that sentiment. The excitement can probably also be seen by there existing 13 independent software packages that do "debian package building in docker": https://wiki.debian.org/SystemBuildTools#Package_build_tools

    But, if everybody is so excited about this, where are the sbuild contributors implementing this? As is hopefully obvious from my above questions, I have no clue about containers, so I'd be the wrong person to work on this. But we have implementations like the one in #867176 since 2017 and nobody stepped up to maintain it since then. This is very curious to me. People (including on this list) are very excited about docker/podman but then there is no code and longterm maintainership that follows. I'd be happy to review patches and help integrating podman into sbuild in any of the three ways I outlined in my other mail. But I need help with that and that help didn't arrive yet. So like the others on this list I am with you Simon, that it would be nice to have less Debian-isms and more use of cross-distro tools. But in practice, what we have are the Debian-isms maintained by only very few and nobody putting their long term efforts into implementing something that re-uses cross-distro approaches in sbuild... :/

    In essence: somebody please help! :)

    Thanks!

    cheers, josch
    --==============ƒ59104100143658919=MIME-Version: 1.0
    Content-Transfer-Encoding: 7bit
    Content-Description: signature
    Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmZ9hJkACgkQ8sulx4+9 g+Gjhw/8DNpl4Lh/g54Ax1u7fVUyUEHBfVxTd4nBfAY4DV0Oc/Y6cPnzdJZL/9Gv uZuiCpw4InBzTCg5Oh9rSY/3YthjXPg9XP6zn90rNayQrnMvDkcQhJuqBNuNyLx5 ZKL6R/5eqmr1cn/oS3JHPxo/Y7IZHoR+6+8rfutwCQG3FAMIa+4gSfMX39S3Wq8Q MFteVQ0ex1wWZzMkIlrcEY9Lrbj8PZXndOFkus8VUL7H4pYg8ZGtrYzD+aTgw3uB eO2VsdJhbyzvIaHbtOMoy3uDVo6vr5shcPHW7e7Zn4DQ4Bxi5lYnoTaTZdY49Vgy iQDbIsMmjUAaSONbUAFEp4aNso+GoR32mcQRAHFsNhQlaWbBD2q4PozHIIm/P0mN wlZV6Dyx3BV9EcubwMNrs9awgcxkcarH/PJTA11kh1HO61KXqeJrZga65AEXw5Pe R/1dd5j0Y/ApBSXIDwRuMPAbFk0TE83WZ/TvjH7ipC+7g+9x48L3qugZ7lidSJ3W aZnK2yj1wV4PFWS3fFt0vu0q9ga/yRDDke3yB93NaLLUMz5GWeps54Sqyxu74fQP GMGgA4Sv8/Q4UA2Bd3DTwmbW56G9lk3/K0HeOyHsxCrBTVBLgyGZTR73kvia8+Xs hdcWl6rkrDe7kiGnr1W1sHpyRdnP+wNbD1iAKYqO+/lit5NDbP4=
    =k4u5
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Johannes Schauer Marin Rodrigues on Thu Jun 27 19:20:01 2024
    On Thu, 27 Jun 2024 at 17:26:20 +0200, Johannes Schauer Marin Rodrigues wrote:
    But, if everybody is so excited about this, where are the sbuild contributors implementing this?

    I'm sorry, consider it added it to my list. As usual, there's no guarantee
    that I will get there within my lifetime, but I'll make sure to feel
    suitably guilty about my failure to achieve it.

    But, having said that:

    The excitement can probably also
    be seen by there existing 13 independent software packages that do "debian package building in docker"

    The reason we don't have 14, one of them from me[1], is the same reason
    I would be reluctant to develop a new sbuild backend without knowing that
    it's what the maintainers of our production infrastructure want to use:

    Packages are de facto unreleasable (which is effectively a higher severity
    than any RC bug) if they don't compile successfully and pass tests in the project's official build environment. Until recently, this meant stable's sbuild and schroot (or sometimes oldstable's sbuild and schroot) entering
    an unstable chroot; more recently, some official buildds switched to the unshare backend, resulting in build failures in that backend becoming worse-than-RC too.

    If I do my test-builds in sbuild + schroot in an (old?)stable VM[3], and
    they succeed, then I can be somewhat confident that when I do the upload,
    the build on the official buildds will succeed too (at least on x86).

    If I do my test-builds in some other way, for example directly in a VM,
    or in podman, docker, lxc, pbuilder, deb-build-snapshot or whatever other
    thing I might personally prefer or find more convenient, then I run the
    risk of having my upload fail to build on the official buildds for a schroot-specific reason, which of course is an unacceptable situation for
    which I would rightly be held responsible; and step 1 of resolving that situation would be to try to replicate the official build environment,
    so I might as well save some time by *already* attempting to replicate
    the official build environment. A lot of my Debian contributions are
    already guilt-based ("if I don't get this uploaded then $bug is in
    some way my fault"), and I'm sorry but I am reluctant to add to that
    by creating new and avoidable opportunities to fail to live up to the
    project's expectations.

    Ideally of course I should do my test-builds in *both* sbuild + schroot
    and whatever container technology I'm (hypothetically) proposing as
    the new production infrastructure, but then each package I release will
    take twice as long per attempt to release, and "smcv takes too long to
    release important fixes" is a failure mode that cannot be fixed by any
    number of additional QA checks.

    Until recently, my understanding is that DSA's policy was to lock
    down all official machines by preventing unprivileged creation of user namespaces system-wide, which rules out podman, making it a poor time investment. This is clearly not entirely true any more, because if
    it was, buildds would not be able to use sbuild's unshare backend -
    so perhaps now is the time to be proposing a sbuild podman backend,
    and I should probably be writing one instead of replying to this message.

    Arguably there is already a sbuild podman backend, albeit indirectly:
    tell sbuild to use an autopkgtest virt server, and then specify the
    podman virt server as the one to use. (This has the limitation that it
    can't use the network to install build-dependencies and then disable
    networking for the actual build, which is a limitation that it shares
    with the current schroot backend.) As I mentioned in another thread, unfortunately I have spent considerably less time on podman in autopkgtest
    than it deserves: I have not tested it recently, so it's entirely possible
    that it doesn't work. If that's the case, then I apologise.

    I'm sorry that I have failed to provide a concrete solution to this
    problem, and I will try to do better in future.

    smcv

    [1] Arguably we *do* have 14, one of them from me, because
    deb-build-snapshot[2] has an "in Docker"/"in Podman" mode - although
    deb-build-snapshot primarily exists to automate generation of labelled
    snapshot test-builds for manual testing, and the fact that it has a
    "build over there" mode is only a side-effect. It isn't intended
    for production use (for example it always builds both arch-dep and
    arch-indep binary packages, which of course is an unacceptably lazy
    shortcut for production or QA use) and I don't maintain it with a
    production-quality level of service, which is why there is no ITP
    and also no wishlist bug against devscripts. I am sorry that this
    tool does not yet meet the project's quality standards.

    [2] https://salsa.debian.org/smcv/deb-build-snapshot

    [3] ... and replicate all the other behaviours that the buildds
    have, such as setting an unreachable home directory, building
    :any and :all separately, and choosing the same undocumented apt
    resolver for experimental and backports that the real buildds do

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Reinhard Tartler on Thu Jun 27 19:40:01 2024
    On Wed, 26 Jun 2024 at 18:05:15 -0400, Reinhard Tartler wrote:
    I imagine that one could whip up some kind of wrapper
    that is building a container either from a tarball created via mmboostrap or similar
    using buildah, have it install all necessary build dependencies, and then use podman to run the actual build

    Yes, one could, and many have; but not (as far as I know) within the
    framework of sbuild, in a way that might be considered acceptable by the operators of our official buildds.

    I also briefly started playing with debcraft, which I really like from a usability perspective

    On Thu, 27 Jun 2024 at 10:52:27 +0200, Bastian Venthur wrote:
    I had the idea to build my Debian packages in a clean docker container instead of using cowbuilder etc for some time now.

    There are lots of options for doing this, some of which are listed in <https://wiki.debian.org/SystemBuildTools#Package_build_tools>.

    All of these have the same problem as cowbuilder, pbuilder, and any
    other solution that is not sbuild + schroot: it isn't (currently) what
    the production Debian buildds use, therefore it is entirely possible
    (perhaps even likely, depending on what packages you maintain) that your package will build successfully and pass tests in your own local builder,
    but then fail to build or fail tests on the buildds as a result of some
    quirk of how schroot sets up its chroots, which is a worse-than-RC bug
    making the package unreleasable.

    I'm sure that a better maintainer than me could avoid this source
    of stress by simply recognising situations that could cause a build
    failure before they happen, and ensuring that no mistakes are made;
    but unfortunately the only way I have found to be able to be somewhat
    confident that my packages will build successfully in the real Debian infrastructure, within my own limitations, is to replicate a real
    Debian buildd (to the best of my ability) and use that replica for
    my test-builds.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johannes Schauer Marin Rodrigues@21:1/5 to All on Thu Jun 27 23:10:02 2024
    Simon,

    Quoting Simon McVittie (2024-06-27 19:16:54)
    On Thu, 27 Jun 2024 at 17:26:20 +0200, Johannes Schauer Marin Rodrigues wrote:
    But, if everybody is so excited about this, where are the sbuild contributors
    implementing this?

    I'm sorry, consider it added it to my list. As usual, there's no guarantee that I will get there within my lifetime, but I'll make sure to feel
    suitably guilty about my failure to achieve it.

    if you want to do me a favour, please do not put it on your todo list. Even more importantly: please try to not feel guilty for anything. If at all possible, I'd like to assure you that you were not even close to being on the list of people (if we imagine that such a list existed in the first place) that I would make responsible.

    This is clearly not entirely true any more, because if it was, buildds would not be able to use sbuild's unshare backend - so perhaps now is the time to be proposing a sbuild podman backend, and I should probably be writing one instead of replying to this message.

    Or you let other people take care of it. There are more than a dozen attempts outside of sbuild. How hard can it be? I consider you one of the most capable and clever people in the project and I greatly value your input into this discussion. But were I to choose where to put your time, it would not be into stretching your resources even more thinly by becoming the sbuild+podman maintainer. If you are really eager I do not want to stop you either. But please, please do not feel pressured by my last email.

    I'm sorry that I have failed to provide a concrete solution to this problem, and I will try to do better in future.

    Please accept my apology for how I phrased my last email. I did not want you to feel sorry for anything.

    I'm sincerely sorry. I did not mean to make you feel guilty.

    Sorry.

    josch
    --==============A08479474909106191=MIME-Version: 1.0
    Content-Transfer-Encoding: 7bit
    Content-Description: signature
    Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmZ91MAACgkQ8sulx4+9 g+GWmw/9GYEudL8kHy9ad4FDrnn5oISKWaPOiXRj2WlqBYKT6oj4snM+Ac8KKvSr DxGYrRmCxX2wn/S2wxRu1dTZfkIkBIf372NwiRBkKh0vAacvz5DQTJTONuu8bImk GWU57EQEnk2vSoKSBfauj6LTgMzJXCBAmjflZJQ7pmuEU59A1ZuojCj9ZoPZVMjL l9o2DJEStjUdX7afhkipLlF3AoW6Xipe1fe4a1IbAcx19HN/rxUpEWE54Lc/Lq6K ZHayvb/y7cSE3tjAA+w40ORPO7YPftyDZQ2BXefXXHcDxzKePNLv0qF5RdlJ15Ga kthukFoSbKfDWGbnA6Ryk4TPxrs6t2cie8IhjFm3LOqMG0ulOnuSXf/kpfyg5NxK x5PiwqutFHLCYc8WGxrDPFkDLiy3lem5ZwOV4Wq/dE5aql39iLaVa1DMoV4oMoKx xk2jIDA8dkZSRBGnygLIK1i646KpxQf1K7onc0qvTSM+hA3vvOSk5B1daD7A5Ap/ E+5RswFLSW5p5D39sdSO/Wnc9olPoaTRwbVVJN2C8+XfAU2UbAm97tT0Fo4q5AS8 wx7SD/GeWC1V1Ih8KVMUq+NYJphqJ3cp6w4thb5HRFOvhoQGjorHR+EbqdIq3SF7 IxFMegwVP/w5x3a8H5imSPaiS/e8TLqG8KiWD67fRFCK5czPoV0=
    =boru
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?T3R0byBLZWvDpGzDpGluZW4=?@21:1/5 to All on Fri Jun 28 05:00:01 2024
    Hi Simon!

    There are lots of options for doing this, some of which are listed in <https://wiki.debian.org/SystemBuildTools#Package_build_tools>.

    All of these have the same problem as cowbuilder, pbuilder, and any
    other solution that is not sbuild + schroot: it isn't (currently) what
    the production Debian buildds use, therefore it is entirely possible
    (perhaps even likely, depending on what packages you maintain) that your package will build successfully and pass tests in your own local builder,
    but then fail to build or fail tests on the buildds as a result of some
    quirk of how schroot sets up its chroots, which is a worse-than-RC bug
    making the package unreleasable.

    Could you point me to some Debian Bug # or otherwise share examples of
    cases when a build succeeded locally but failed on official Debian
    builders due to something that is specific for sbuild/schroot?

    I have never run in such a situation despite doing Debian packaging
    for 10 years with fairly complex C++ software targeting all archs
    Debian supports. Also as a member of the Salsa-CI team I don't recall
    ever seeing a bug report about something built on Salsa in a container successfully but failed to build on actual buildd.

    I am not dismissive of your claim - as a very senior DD you surely
    have those experiences - I am just curious to learn what those cases
    might have been.

    I could imagine that buildd builds fail if they the source was
    prepared in a non-hermetic environment that ran as root, or had
    network access, or if build environment was unclean and debian/control
    was missing some dependencies, but that is elementary hermetic build environment properties and not inherently something that *only*
    sbuild/schroot does.

    Related, you might want to take a peek at the source code of https://salsa.debian.org/otto/debcraft how it supports both Podman and
    Docker, and how it generates the 'root.tar.gz' equivalent container automatically based on debian/control and debian/changelog contents,
    and then runs the actual build as a regular non-root user in a
    container that has no network access. If I learn about other
    requirements for a hermetic build environment I would be happy to
    incorporate it.

    - Otto

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to All on Fri Jun 28 12:00:01 2024
    On Thu, 27 Jun 2024 at 19:56:43 -0700, Otto Kekäläinen wrote:
    Could you point me to some Debian Bug # or otherwise share examples of
    cases when a build succeeded locally but failed on official Debian
    builders due to something that is specific for sbuild/schroot?

    I can't easily point you to a Debian bug number, because I try to only
    upload packages that live up to Debian's quality standards, which means
    I've been routinely building packages for upload in sbuild/schroot for
    several years; so if a package fails in that situation, I do not upload,
    and retry as many times as it takes to get it right.

    (I'm sure I've failed to do that several times, but I'm sorry, I mostly
    can't remember specific instances or bug numbers; I generally try to fix
    the regression as quickly as I can.)

    But, some examples of packages and the reasons they fail:

    - bubblewrap, repeatedly. Its test suite wants to create new user
    and filesystem namespaces, which is unconditionally not allowed by
    the kernel while inside a chroot (because the kernel doesn't want to
    allow filesystem namespaces to be used to escape from a chroot). The
    relevant tests have to be skipped in situations where they can't work.

    "Real" container managers that use pivot_root() instead of chroot(),
    such as Docker and Podman, sometimes allow creation of nested user
    namespaces (like bwrap by default, and docker --privileged), sometimes
    deny it (like bwrap --disable-userns, and Docker by default), and
    sometimes cannot allow it because some larger factor forces their hand:
    it's non-obvious what will work.

    The conditions for not being allowed to create new namespaces are
    relatively complicated and poorly-documented, and the error reporting is
    minimal (two or three errno values have to cover every possible failure
    mode), so this is something that has to be done by trial and error.

    Until recently, DSA'd machines all used
    /proc/sys/kernel/unprivileged_userns_clone to disable unprivileged
    creation of user namespaces anyway. This restriction has presumably
    been lifted for the buildds that use sbuild in unshare mode.

    - xdg-desktop-portal, repeatedly. Its test suite uses FUSE, which is
    disabled (the module is prevented from loading) on official Debian
    buildds as a security hardening mechanism, even though on typical
    end-user or server Debian systems it works fine.

    This is one that I did have to find out via FTBFS, because I don't yet
    have a local build environment that replicates this restriction. I know
    that I should, and it's on my list.

    - ostree, at least once. The test suite historically assumed that /var/tmp
    supports extended attributes, which is not true on all buildds (ordinary
    on-disk filesystems usually do support them, but tmpfs doesn't or didn't
    until recently, and some buildds with plenty of RAM operate in a tmpfs
    root filesystem to speed up their builds).

    - flatpak, repeatedly. Same as bubblewrap, ostree and x-d-p, combined.

    - dbus, historically. For a long time, when using the non-default
    DBUS_COOKIE_SHA1 authentication mechanism, libdbus ignored $HOME and
    instead used the "official" home directory from /etc/passwd
    (the equivalent of `getent passwd $(id -u) | cut -d: -f6`). Official
    buildds set the user's home directory to /nonexistent, so this fails.
    In production use, dbus normally uses EXTERNAL over AF_UNIX (and doesn't
    even allow DBUS_COOKIE_SHA1, as a piece of security hardening), but in
    its build-time tests it specifically exercises each auth mechanism and
    each transport, including DBUS_COOKIE_SHA1 over TCP (which is a
    terrible idea on Unix but is unfortunately necessary on Windows).

    - GLib, ongoing (#972151). When the GLib test suite tests interoperability
    with libdbus, it (IMO reasonably!) expects ("localhost", AF_INET) to
    resolve to 127.0.0.1, but that doesn't work on IPv6-only buildds for
    relatively complicated reasons involving subtleties of glibc resolver
    behaviour (#952740). My local build environment still doesn't have code
    to reproduce this, and I'm sorry that I haven't provided workarounds or
    fixes in the GLib test suite or in libdbus' discouraged TCP code paths.
    If someone wants to work on this, skipping the interop tests for TCP on
    IPV6-only buildds would probably be more proportionate than adjusting
    libdbus' name-resolution behaviour for a feature nobody should be
    using in production anyway.

    - Any package that assumes that if $XDG_RUNTIME_DIR is set, then it is
    set to a usable value (because historically schroot would set it to
    a value that exists/works on the host system, but does not exist and
    cannot be created inside the container). This is worked around by
    individual packages unsetting XDG_RUNTIME_DIR or setting it to a more
    useful value, or automatically by recent debhelper in a sufficiently
    high compat level (#942111).

    I have never run in such a situation despite doing Debian packaging
    for 10 years with fairly complex C++ software targeting all archs
    Debian supports.

    If your complex C++ software is doing pure computation without
    side-effects, or if it's doing something that's unaffected by being in
    a chroot (like file I/O to the build directory, or IPC via AF_UNIX)
    then it can be extremely complex and still not hit this sort of thing. Conversely, container-adjacent tools that want to run build-time tests
    will hit this sort of thing every time.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Fri Jun 28 15:10:01 2024
    "Helmut" == Helmut Grohne <helmut@subdivi.de> writes:

    Helmut> In this work, limitations with --chroot-mode=unshare became
    Helmut> apparent and that lead to Johannes, Jochen and me sitting
    Helmut> down in Berlin pondering ideas on how to improve the
    Helmut> situation. That is a longer story, but eventually Timo
    Helmut> Röhling asked the innocuous question of why we cannot just
    Helmut> use schroot and make it work with namespaces.

    I'll be honest, I think building a new container backend makes no sense
    at all.
    There's a lot of work that has gone into systemd-nspawn, podman, docker,
    crun, runc, and the related ecosystems.

    I think an approach that allowed sbuild to actually use a real container backend would be long-term more maintainable and would allow Debian's
    DevOps practices to better align with the rest of the world.

    I have some work I've been doing in this space which won't be useful to
    you because it is not built on top of sbuild.
    (Although I'd be happy to share under LGPL-3 for anyone interested.)

    But I find that I disagree with the idea of writing a new container
    runtime for sbuild so strongly that I can no longer use sbuild for
    Debian work, so I started working on my own package building solution.

    I realize that I have not done a good job of being constructive here.
    I intended to write some blog posts on this topic, but got sucked into
    work and tag2upload.

    In terms of constructive feedback:

    * I think your intuition that sbuild --chroot=unshare is limiting is
    good.

    * I would move toward a persistent namespace approach because it is
    more similar to broadly used container backends.

    * overlayfs/fuse-overlayfs are how the rest of the world is solving
    these problems (or snapshots and the like). Directories are kind of a
    Debian-specific artifact that I find more and more awward to deal with
    as the rest of my work uses containers for CI/CD.


    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCZn61xAAKCRAsbEw8qDeG dF53AQD9437yLurwqRXX/iVtAardYudwQ/69HCHThSuGbO+bZgD8DCykOFexVgRc BKGO1u1Ft3vftbpqUl6EZBQuGT5u9go=XPeu
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Lewis@21:1/5 to otto@debian.org on Sat Jun 29 18:20:01 2024
    Otto Kekäläinen <otto@debian.org> writes:

    Could you point me to some Debian Bug # or otherwise share examples of
    cases when a build succeeded locally but failed on official Debian
    builders due to something that is specific for sbuild/schroot?

    I believe both these uploads

    https://tracker.debian.org/news/1284669/accepted-chkrootkit-055-3-source-into-unstable/
    https://tracker.debian.org/news/1288719/accepted-chkrootkit-055-4-source-into-unstable/

    were primarily made to fix autopkgtest failures that occurred on debian infrastructure, and were not noticed before because

    https://tracker.debian.org/news/1280523/accepted-chkrootkit-055-2-source-into-unstable/

    had been only been tested with schroot+sbuild locally using the
    --mode=schroot backend

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Sun Jun 30 00:00:02 2024
    "Richard" == Richard Lewis <richard.lewis.debian@googlemail.com> writes:

    Richard> Otto Kekäläinen <otto@debian.org> writes:
    >> Could you point me to some Debian Bug # or otherwise share
    >> examples of cases when a build succeeded locally but failed on
    >> official Debian builders due to something that is specific for
    >> sbuild/schroot?

    Until I fixed it, krb5 would not work in a network namespace that only
    had a lo interface. It ran getaddrinfo with GAI_ADDRCONFIG in its
    tests, because localhost is discouraged/not allowed in krb5 ticket
    addresses per RFC 4120. It only talks to itself, but it really wants to
    talk to itself not on localhost. I patched the sources to work around
    sbuild chroot=unshare.

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCZoCDMAAKCRAsbEw8qDeG dNPyAP91JFtbkU0Nek3R74MENVIblTAt/thHZi7zlseoRsg5sgD7BgY/ItecDiKw vqKdSHmbiZlFKCGj5iT3VbsCfDaJ9A4=7Su9
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Kern@21:1/5 to Christian Kastner on Mon Jul 1 09:20:01 2024
    Hi,

    On 2024-06-29 22:21, Christian Kastner wrote:
    At the moment, rootless Podman would seem like the obvious choice. As
    far
    as I'm aware, it has the same user namespaces requirements as the
    unshare
    backends in mmdebstrap, autopkgtest and schroot (user namespaces
    enabled,
    setuid newuidmap, 65536 uids in /etc/subuid, 65536 gids in
    /etc/subgid).

    As a datapoint, I use rootless podman containers extensively both for autopkgtest and as an sbuild backend (though the latter is affected by #1033352 for which I still need to implement a cleaner workaround).

    I think the only problem I encountered was a corner case when passing
    in
    a device into a container: at some point, autopkgtest runs su which
    uses
    the setgroups() syscall, and group permissions get lost. The solution
    was to setup up the proper gidmaps. I documented my findings here [1].

    Though this latter issue shouldn't be a problem on buildds, where
    devices aren't passed in.

    How well does this setup nest? I had a lot of trouble trying to run the
    unshare backend within an unprivileged container as setup by
    systemd-nspawn - mostly with device nodes. In the end I had to give up
    and replaced the container with a full-blown VM. I understand that some
    of the things compose a little if the submaps are set up correctly, with
    less IDs allocated to the nested child. Is there a way to make this work properly, or would you always run into setup issues with device nodes at
    this point?

    Specifically I'm concerned about what this means for tests and if they
    should be able to use unprivileged containers themselves to test things.
    I guess we made the decision that we just assume "root" for testing. But
    right now you could - presumably - also setup more things under that
    assumption that would not work in an unprivileged setup. Is that a
    problem?

    Relatedly it'd be great if we actually had a VM in-between us and the
    build. But that only works well on some architectures, only composes
    well on even less (e.g. arm64 not having nested virtualization yet), and
    only provides a marginal benefit if you execute the build outside of the
    VM as well. But it'd shield us more from supply chain issues.

    Kind regards
    Philipp Kern

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Philipp Kern on Mon Jul 1 18:00:02 2024
    On Mon, 01 Jul 2024 at 09:18:19 +0200, Philipp Kern wrote:
    Specifically I'm concerned about what [advocating use of podman]
    means for tests and if they
    should be able to use unprivileged containers themselves to test things.

    tl;dr: There's no regression here, because you already can't run those
    tests on a buildd.

    There's no unified definition of "container" in the Linux kernel, only
    a selection of different mechanisms that are used by container managers
    to do what they want to do according to their individual security models
    and desired functionality, so the only fully general answer we can give
    to this is: there are containers, and there are containers, so you'll
    need to be more specific about which specific things you want.

    One use-case that I'm familiar with is bwrap (bubblewrap, as used by
    flatpak) nested inside podman. bwrap is a relatively limited container technology with relatively "light" requirements, at the cost of imposing
    harsh restrictions on the code inside the container: you only get access
    to one uid, and all other uids get mapped to the overflow uid ("nobody).
    You can think of as having two possible identities, "me" and "not me".
    Even with that limitation, bwrap inside podman doesn't normally work,
    because podman forbids most nested container operations. I'm unsure
    whether this is a functional requirement to prevent attacks where the
    podman container "payload" escapes from the container and gets arbitrary
    code execution on the host, or whether this is merely non-essential
    security hardening to make it harder to exploit possible vulnerabilities
    that podman aims to already prevent in some other way. Either way,
    I would expect that buildd operators would not want to allow it.

    podman nested inside podman is "more difficult" than bwrap nested inside
    podman (because it's more capable and imposes fewer restrictions on the payload, therefore needs a larger-than-default block of uids to be made available, whereas bwrap only needs one uid), and almost certainly also
    won't work.

    But neither of these is a regression, because we can't normally do either
    of those things inside schroot anyway! So packages like bubblewrap and
    flatpak have no choice but to skip most of their regression tests at build-time. This is obviously not ideal, but it's better than not being
    able to ship these packages in Debian at all.

    On ci.debian.net, the bubblewrap and flatpak test suites are re-run as "as-installed" tests, and those *can* be run, using autopkgtest's qemu
    backend - although I believe that's currently disabled because of some technical issues with the qemu backend or the infrastructure, so those
    tests might end up being skipped (again) on the lxc backend.

    I believe bwrap nested inside `podman --privileged` *does* work. As I
    said above, I don't know where that falls on the scale between "believed
    to be secure, but less well-hardened" and "definitely not secure".

    Relatedly it'd be great if we actually had a VM in-between us and the build.

    Prior art for this includes `sbuild --chroot-mode=autopkgtest --autopkgtest-virt-server=qemu` (which uses qemu instead of schroot
    or podman as the "container" for the actual build), openSUSE's
    Open Build Service (which uses a new VM for each build in at
    least some configurations), and my own experimental build wrapper <https://salsa.debian.org/smcv/vectis> (which runs the whole sbuild
    instance inside the VM, in an attempt to be bug-for-bug compatible with Debian's production infrastructure as mentioned earlier in this thread).

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Philipp Kern on Sat Jul 6 18:40:01 2024
    Hi Philipp,

    Let me go into some detail that is tangential to the larger discussion.

    On Mon, Jul 01, 2024 at 09:18:19AM +0200, Philipp Kern wrote:
    How well does this setup nest? I had a lot of trouble trying to run the unshare backend within an unprivileged container as setup by systemd-nspawn
    - mostly with device nodes. In the end I had to give up and replaced the container with a full-blown VM. I understand that some of the things compose a little if the submaps are set up correctly, with less IDs allocated to the nested child. Is there a way to make this work properly, or would you always run into setup issues with device nodes at this point?

    Technically speaking, nesting is possible. The individual container implementation may limit you, but that's an implementation limit and not
    a fundamental one. I'm assuming that you want to nest a rootless
    container in a rootless container as that tends to be the most difficult
    one. Roughly speaking your unprivileged container wants access to your
    user id and a 64k allocation of subuids. This applies to the nested
    container. If your outer container maps two 64k ranges (one to 0 to
    65535 and the other to whatever your user has in its contained
    /etc/subuid), your contained user should actually be able to spawn a
    podman container unless I am missing something important. Devices
    usually are not a problem (for rootless containers) as you cannot create
    them anyway so you end up bind mounting them and the bind mounting
    technique nests well.

    A typical Debian installation only allocates a single 64k range to each
    user. Your first step here is growing that range or adding another one.
    (Yes, you may have multiple lines for your user in /etc/subuid.) Then
    the podman-run documentation hints at --uidmap and it says that you can
    specify it multiple times to map multiple ranges. This is how you
    construct your outer container. Then inside, nesting should just work. Admittedly, I've not tried this.

    The takeaway should be that if your outer container is constructed in
    the right way, you should be able to nest other containers (e.g. podman, mmdebstrap, sbuild unshare, ...) without issues. It's not like this just
    works out of the box, but it should be feasible.

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)