• Bug#1068890: diffoscope: --hard-timeout option

    From Holger Levsen@21:1/5 to All on Sat Apr 13 00:50:01 2024
    Package: diffoscope
    Version: 264
    Severity: wishlist

    Dear Maintainer,

    currenlty diffoscope has a --timeout option

    --timeout SECONDS
    Best-effort attempt at a global timeout in seconds. If enabled, diffoscope will not recurse into any further sub-archives
    after X seconds of total execution time. (default: no timeout) [experimental]

    however this doesnt give any guarantees how long diffoscope will be running, so so far we haven't used it for the RB CI tests, mostly because I'm not sure
    what would be a good inner timeout (=for diffoscope) and what would be a good good outer timeout (=for killing diffoscope from the outside no matter what).

    Currently we use 2h as outer timeout, but have no inner timeout. Maybe we should
    use --timeout 1h?

    Anyhow, about my --hard-timeout option idea:

    my idea of "--hard-timeout $time" is that diffoscope terminates itself after $time, no matter what *and* then re-starts itself with "--max-container-depth 3"
    (or whatever is useful to get a glimpse on what files in a Debian package
    are different) (probably also with another hard timeout set...) as to guarantee to always produce meaningful output (especially html output if specified with --html).

    What do you think?

    Else we could also extend the current code for tests.r-b.o/debian, which currently
    just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 :)

    https://tests.reproducible-builds.org/debian/index_breakages.html lists
    251 pkg/suite/arch combinations where diffoscope runs into a timeout...


    & many thanks for rocking diffoscope airlines..! \o/

    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    Bottled water companies don't produce water, they produce plastic bottles.

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmYZuZEACgkQCRq4Vgaa qhxT0Q/+OeRj10Fp71VKljXmd/kAL6rend+0KgvKIFGLnOtTBoMqNDSq6co0tdbi Kr2B+newk2V5BfsC+ZA/69gcg9kSgsA8qyp9emoKo8onqLxtpKgEuNpqt2rA/ZJN AR5Cdex+Yidx7scoSQyIKOHFquVHZzMKc/u57MOehEkk7GCrnFpVfaOZaSdzlN56 xWYoQ82L9kzX5Nt7QJIIcberElfwMPINbyit4ADhPKMljytjpiQgcqUOKpij92Oa MICEIQQtdbcuRSgKihjujqiVq3VTTJYOuOFsP6GvwR5eMnez7x/rIXb+rM8bn0vd UekCrhoHDV8+cZQulxumlVD7uUQaPWjpiV00hfpkBtCMmaVh6FLxkEReF3x32zwz X5kBW0CGaxkiliIiul67kn5t90ayI0UcS5UKE0CWofU7LLYxIW8dedKj6e6a+omN vHVkisFw67t5gk1H1yWRcL3GlUWnS50QDlWGPfqBgITh0nhJnQ8aaYGUnaI1rjJm QLDGWQVRjmmFjk8r0PCiKLXGp5jbzoEODkrl18veAF8MPKs/wNXGonrgzSiXkwtx 7WSylKHno14HNV5BvqtR6qh470jah/XyKRADJj
  • From Chris Lamb@21:1/5 to Holger Levsen on Tue Apr 16 18:10:01 2024
    Holger Levsen wrote:

    Anyhow, about my --hard-timeout option idea:

    my idea of "--hard-timeout $time" is that diffoscope terminates itself
    after $time, no matter what *and* then re-starts itself with "--max-container-depth 3"

    Just to say that I am totally on board with the idea of ensuring we
    get _something_ out of diffoscope on tests.reproducible-builds.org.
    Way better than 250 timeouts.

    However, I think this first iteration of --hard-timeout time has a few
    things that would need ironing out first, and potentially make it not
    worth implementing:

    (1) You suggest it should start again with "--max-container-depth 3",
    but it would surely need some syntax (or another option?) to control
    that "3" (but for the second time only).

    (2) In fact, its easy to imagine that one would want to restart with
    other restrictions as well: not just --max-container-depth. For
    instance, excluding external commands like readelf and objdump that
    you know to be slow.

    (3) The output might need some comment saying "this was re-run with restrictions as we hit a timeout".

    (4) My gut feel that it would not be all that great to rely on CPython
    to really properly clear up child processes after a certain amount of
    time. Although I believe the most reliable top-level description to do
    this kind of thing inside CPython is to start a watchdog thread that
    sleeps until the timeout and then tries to kill everything, but my
    experience of doing anything like this within Python itself is not
    great, and essentially always needed something at the process level
    outside of it for it to be reliable. A container would be even more
    effective, I'm sure.

    In other words, I think the best way of achieving the result we want
    is, alas, by doing it outside of diffoscope at the level of the
    Jenkins. As in, exactly what you describe here:

    Else we could also extend the current code for tests.r-b.o/debian,
    which currently
    just kills diffoscope after 2h, to then run diffoscope
    --max-container-depth 3 :)

    Is that a massive faff? :/


    Best wishes,

    --
    o
    ⬋ ⬊ Chris Lamb
    o o reproducible-builds.org 💠
    ⬊ ⬋
    o

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Holger Levsen@21:1/5 to Chris Lamb on Tue Apr 16 23:00:01 2024
    On Tue, Apr 16, 2024 at 04:51:09PM +0100, Chris Lamb wrote:
    Just to say that I am totally on board with the idea of ensuring we
    get _something_ out of diffoscope on tests.reproducible-builds.org.

    :) great!

    Way better than 250 timeouts.

    https://tests.reproducible-builds.org/debian/stats_breakages.png
    showed that in the last 3-4 years there was constant progress on that! \o/

    However, I think this first iteration of --hard-timeout time has a few
    things that would need ironing out first, and potentially make it not
    worth implementing:

    (1) You suggest it should start again with "--max-container-depth 3",
    but it would surely need some syntax (or another option?) to control
    that "3" (but for the second time only).

    another option, --second-pass-max-container-depth or some such

    (2) In fact, its easy to imagine that one would want to restart with
    other restrictions as well: not just --max-container-depth. For
    instance, excluding external commands like readelf and objdump that
    you know to be slow.

    yes, that's a good idea and IMO should be automatically implied for the
    2nd pass or round or try.

    (3) The output might need some comment saying "this was re-run with restrictions as we hit a timeout".

    absolutly.

    (4) My gut feel that it would not be all that great to rely on CPython
    to really properly clear up child processes after a certain amount of
    time. Although I believe the most reliable top-level description to do
    this kind of thing inside CPython is to start a watchdog thread that
    sleeps until the timeout and then tries to kill everything, but my
    experience of doing anything like this within Python itself is not
    great, and essentially always needed something at the process level
    outside of it for it to be reliable. A container would be even more effective, I'm sure.

    hmmm.

    In other words, I think the best way of achieving the result we want
    is, alas, by doing it outside of diffoscope at the level of the
    Jenkins. As in, exactly what you describe here:

    Else we could also extend the current code for tests.r-b.o/debian,
    which currently
    just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 :)

    Is that a massive faff? :/

    not really, I guess it would be rather simple even, I just thought
    (or think?) that it would be a nice feature for diffoscope proper.


    --
    cheers,
    Holger

    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
    ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
    ⠈⠳⣄

    The purpose of propaganda isn't to make you believe something. It's to make you believe nothing. So that you do nothing. (@DarthPutinKGB)

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmYe5UMACgkQCRq4Vgaa qhzqmA/+PldXFHcM1jwqTWwE41p03Wjxs5OWFkVTvMcMjIOorONsohHHXWdGFXxS Jc0DIsuyWBsKUR0WiYApzkEVYfpQwjNJcJ2tnUf98nNFQ8ri+Eia8VHmEIJhaGEV C1D3On2JhrMgtUo7Tlg+NvZWrJf7zBUjbR9sz7QfP7rM6rq9OyWhX874qScJF9rI l0waoE1UNGHRlVvC0hFzj1wIlusMt9cBn1GI1fSHD4jmOs8auHEi38K2cvQQEhYc o/XCVNhgEANBCbVhyQhCYRReHcGWFcibAuUKHqak3JUKq/igTLLDrjTKxb1i0/1Z Egsi/fhN4nXyRf+Mv/zXX3EpSc6Jo3KgMfa8ttMXOWxCfFIkuNPxK2BOJwr/NjCC RiJOo+L2qQgqWzOlQMnKsZmeIpGbbDaKs4hepHlkmPQDhfwG96J0arBCAFm7K69d jMXu3jUOk/0wQ5x2tp3sck4NfZSVrk+KumixDYHpWsMKwBCSMijQy/CjJvJpywWu ToK8eyFqAd7ap1se5VpK4TuyGcDHcMkWEcuCVBEI
  • From Vagrant Cascadian@21:1/5 to Chris Lamb on Tue Apr 16 23:30:01 2024
    On 2024-04-16, Chris Lamb wrote:
    However, I think this first iteration of --hard-timeout time has a few
    things that would need ironing out first, and potentially make it not
    worth implementing:

    (1) You suggest it should start again with "--max-container-depth 3",
    but it would surely need some syntax (or another option?) to control
    that "3" (but for the second time only).

    What about going the other direction ... starting with a very small
    value for max-container-depth, and incrementally increasing it,
    generating a report (or at least storing sufficient data to generate
    one) in between each increment, so you always get some information, but essentially incrementally increase the resolution?

    Or would that approach just be too inefficient?


    (2) In fact, its easy to imagine that one would want to restart with
    other restrictions as well: not just --max-container-depth. For
    instance, excluding external commands like readelf and objdump that
    you know to be slow.

    Ah, yes, knowing the common time sinks would be tremendously helpful!


    live well,
    vagrant

    -----BEGIN PGP SIGNATURE-----

    iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCZh7soQAKCRDcUY/If5cW qgroAQD2akFqT28ZBD87gPe9ywc92nRhan1DG0JF09iSlK4fUQEAtvC2fknjx+g+ R+gHW+p+ADFnFrVXF/h+wo28D7Bh8As=
    =TX3u
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Lamb@21:1/5 to Vagrant Cascadian on Thu Apr 18 13:50:01 2024
    Vagrant Cascadian wrote:

    On 2024-04-16, Chris Lamb wrote:
    However, I think this first iteration of --hard-timeout time has a few
    things that would need ironing out first, and potentially make it not
    worth implementing:

    (1) You suggest it should start again with "--max-container-depth 3",
    but it would surely need some syntax (or another option?) to control
    that "3" (but for the second time only).

    What about going the other direction ... starting with a very small
    value for max-container-depth, and incrementally increasing it,
    generating a report (or at least storing sufficient data to generate
    one) in between each increment, so you always get some information, but essentially incrementally increase the resolution?

    Or would that approach just be too inefficient?

    This is probably a separate required best suited to another issue at
    this point, but I do like the idea of being able to incrementally
    increase the resolution over time. Depending on how it worked in practice, there should not be significant overhead in managing this
    if, say, the commands that could not be run "in time" would have token placeholders internally that rendered to text in the output rather
    than non-trivial/expensive binary diffs.

    On the negative side though, I think this would still require a robust
    way of killing long-running processes as outlined previously. But
    moreover it would require a HUGE reworking of how diffoscope handles containers and recurses into nested structures in its tree-like style.
    Indeed, thinking about it, this change would pretty much be exactly
    the same work needed to make diffoscope run in parallel (!) which hopefully communicates both the scope of the changes that would be
    needed to achieve this, and that making diffoscope run in parallel
    also has other benefits. Anyway, mini brain dump over.


    Regards,

    --
    o
    ⬋ ⬊ Chris Lamb
    o o reproducible-builds.org 💠
    ⬊ ⬋
    o

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Lamb@21:1/5 to Holger Levsen on Thu Apr 18 13:40:01 2024
    Holger Levsen wrote:

    (1) You suggest it should start again with "--max-container-depth 3",
    but it would surely need some syntax (or another option?) to control
    that "3" (but for the second time only).

    another option, --second-pass-max-container-depth or some such

    (2) In fact, its easy to imagine that one would want to restart with
    other restrictions as well: not just --max-container-depth. For
    instance, excluding external commands like readelf and objdump that
    you know to be slow.

    yes, that's a good idea and IMO should be automatically implied for the
    2nd pass or round or try.

    It's definitely a "good idea" in the sense that I can definitely see
    someone wanting to achieve that as an end result :)

    Yet… upon thinking about it a bit, I don't think it is a good idea at
    all for diffoscope to grow a bunch of new options or hardcoded
    defaults for a second run. What (1) and (2) show here is that as soon
    as a user would like to adjust these second pass options in any way,
    then the whole interface becomes very unwieldy. Not only that, but
    from the user's point of view it's neither flexible nor transparent as
    well, especially when compared to "just" running diffoscope twice with different options. There's no "magic" there, if you see what I mean.

    Can we implement running diffoscope twice on tests.r-b.org manually
    first and see how that goes? I'm not 100% against the idea of implementing this in diffoscope eventually, but it would make a lot of
    sense to try out the "manual" version first and gain some real-world experience first.


    Regards,

    --
    o
    ⬋ ⬊ Chris Lamb
    o o reproducible-builds.org 💠
    ⬊ ⬋
    o

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)