Forum: >>> Magnum BBS <<<

Bug#1068890: diffoscope: --hard-timeout option

From Holger Levsen@21:1/5 to All on Sat Apr 13 00:50:01 2024

Package: diffoscope
Version: 264
Severity: wishlist

Dear Maintainer,

currenlty diffoscope has a --timeout option

--timeout SECONDS
Best-effort attempt at a global timeout in seconds. If enabled, diffoscope will not recurse into any further sub-archives
after X seconds of total execution time. (default: no timeout) [experimental]

however this doesnt give any guarantees how long diffoscope will be running, so so far we haven't used it for the RB CI tests, mostly because I'm not sure
what would be a good inner timeout (=for diffoscope) and what would be a good good outer timeout (=for killing diffoscope from the outside no matter what).

Currently we use 2h as outer timeout, but have no inner timeout. Maybe we should
use --timeout 1h?

Anyhow, about my --hard-timeout option idea:

my idea of "--hard-timeout $time" is that diffoscope terminates itself after $time, no matter what *and* then re-starts itself with "--max-container-depth 3"
(or whatever is useful to get a glimpse on what files in a Debian package
are different) (probably also with another hard timeout set...) as to guarantee to always produce meaningful output (especially html output if specified with --html).

What do you think?

Else we could also extend the current code for tests.r-b.o/debian, which currently
just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 :)

https://tests.reproducible-builds.org/debian/index_breakages.html lists
251 pkg/suite/arch combinations where diffoscope runs into a timeout...

& many thanks for rocking diffoscope airlines..! \o/

--
cheers,
Holger

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

Bottled water companies don't produce water, they produce plastic bottles.

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmYZuZEACgkQCRq4Vgaa qhxT0Q/+OeRj10Fp71VKljXmd/kAL6rend+0KgvKIFGLnOtTBoMqNDSq6co0tdbi Kr2B+newk2V5BfsC+ZA/69gcg9kSgsA8qyp9emoKo8onqLxtpKgEuNpqt2rA/ZJN AR5Cdex+Yidx7scoSQyIKOHFquVHZzMKc/u57MOehEkk7GCrnFpVfaOZaSdzlN56 xWYoQ82L9kzX5Nt7QJIIcberElfwMPINbyit4ADhPKMljytjpiQgcqUOKpij92Oa MICEIQQtdbcuRSgKihjujqiVq3VTTJYOuOFsP6GvwR5eMnez7x/rIXb+rM8bn0vd UekCrhoHDV8+cZQulxumlVD7uUQaPWjpiV00hfpkBtCMmaVh6FLxkEReF3x32zwz X5kBW0CGaxkiliIiul67kn5t90ayI0UcS5UKE0CWofU7LLYxIW8dedKj6e6a+omN vHVkisFw67t5gk1H1yWRcL3GlUWnS50QDlWGPfqBgITh0nhJnQ8aaYGUnaI1rjJm QLDGWQVRjmmFjk8r0PCiKLXGp5jbzoEODkrl18veAF8MPKs/wNXGonrgzSiXkwtx 7WSylKHno14HNV5BvqtR6qh470jah/XyKRADJj

From Chris Lamb@21:1/5 to Holger Levsen on Tue Apr 16 18:10:01 2024

Holger Levsen wrote:

Anyhow, about my --hard-timeout option idea:

my idea of "--hard-timeout $time" is that diffoscope terminates itself
after $time, no matter what *and* then re-starts itself with "--max-container-depth 3"

Just to say that I am totally on board with the idea of ensuring we
get _something_ out of diffoscope on tests.reproducible-builds.org.
Way better than 250 timeouts.

However, I think this first iteration of --hard-timeout time has a few
things that would need ironing out first, and potentially make it not
worth implementing:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

(2) In fact, its easy to imagine that one would want to restart with
other restrictions as well: not just --max-container-depth. For
instance, excluding external commands like readelf and objdump that
you know to be slow.

(3) The output might need some comment saying "this was re-run with restrictions as we hit a timeout".

(4) My gut feel that it would not be all that great to rely on CPython
to really properly clear up child processes after a certain amount of
time. Although I believe the most reliable top-level description to do
this kind of thing inside CPython is to start a watchdog thread that
sleeps until the timeout and then tries to kill everything, but my
experience of doing anything like this within Python itself is not
great, and essentially always needed something at the process level
outside of it for it to be reliable. A container would be even more
effective, I'm sure.

In other words, I think the best way of achieving the result we want
is, alas, by doing it outside of diffoscope at the level of the
Jenkins. As in, exactly what you describe here:

Else we could also extend the current code for tests.r-b.o/debian,
which currently
just kills diffoscope after 2h, to then run diffoscope
--max-container-depth 3 :)

Is that a massive faff? :/

Best wishes,

--
o
⬋ ⬊ Chris Lamb
o o reproducible-builds.org 💠
⬊ ⬋
o

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Holger Levsen@21:1/5 to Chris Lamb on Tue Apr 16 23:00:01 2024

On Tue, Apr 16, 2024 at 04:51:09PM +0100, Chris Lamb wrote:

Just to say that I am totally on board with the idea of ensuring we
get _something_ out of diffoscope on tests.reproducible-builds.org.

:) great!

Way better than 250 timeouts.

https://tests.reproducible-builds.org/debian/stats_breakages.png
showed that in the last 3-4 years there was constant progress on that! \o/

However, I think this first iteration of --hard-timeout time has a few
things that would need ironing out first, and potentially make it not
worth implementing:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

another option, --second-pass-max-container-depth or some such

(2) In fact, its easy to imagine that one would want to restart with
other restrictions as well: not just --max-container-depth. For
instance, excluding external commands like readelf and objdump that
you know to be slow.

yes, that's a good idea and IMO should be automatically implied for the
2nd pass or round or try.

(3) The output might need some comment saying "this was re-run with restrictions as we hit a timeout".

absolutly.

(4) My gut feel that it would not be all that great to rely on CPython
to really properly clear up child processes after a certain amount of
time. Although I believe the most reliable top-level description to do
this kind of thing inside CPython is to start a watchdog thread that
sleeps until the timeout and then tries to kill everything, but my
experience of doing anything like this within Python itself is not
great, and essentially always needed something at the process level
outside of it for it to be reliable. A container would be even more effective, I'm sure.

hmmm.

In other words, I think the best way of achieving the result we want
is, alas, by doing it outside of diffoscope at the level of the
Jenkins. As in, exactly what you describe here:

Else we could also extend the current code for tests.r-b.o/debian,
which currently
just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 :)

Is that a massive faff? :/

not really, I guess it would be rather simple even, I just thought
(or think?) that it would be a nice feature for diffoscope proper.

--
cheers,
Holger

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

The purpose of propaganda isn't to make you believe something. It's to make you believe nothing. So that you do nothing. (@DarthPutinKGB)

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEuL9UE3sJ01zwJv6dCRq4VgaaqhwFAmYe5UMACgkQCRq4Vgaa qhzqmA/+PldXFHcM1jwqTWwE41p03Wjxs5OWFkVTvMcMjIOorONsohHHXWdGFXxS Jc0DIsuyWBsKUR0WiYApzkEVYfpQwjNJcJ2tnUf98nNFQ8ri+Eia8VHmEIJhaGEV C1D3On2JhrMgtUo7Tlg+NvZWrJf7zBUjbR9sz7QfP7rM6rq9OyWhX874qScJF9rI l0waoE1UNGHRlVvC0hFzj1wIlusMt9cBn1GI1fSHD4jmOs8auHEi38K2cvQQEhYc o/XCVNhgEANBCbVhyQhCYRReHcGWFcibAuUKHqak3JUKq/igTLLDrjTKxb1i0/1Z Egsi/fhN4nXyRf+Mv/zXX3EpSc6Jo3KgMfa8ttMXOWxCfFIkuNPxK2BOJwr/NjCC RiJOo+L2qQgqWzOlQMnKsZmeIpGbbDaKs4hepHlkmPQDhfwG96J0arBCAFm7K69d jMXu3jUOk/0wQ5x2tp3sck4NfZSVrk+KumixDYHpWsMKwBCSMijQy/CjJvJpywWu ToK8eyFqAd7ap1se5VpK4TuyGcDHcMkWEcuCVBEI

From Vagrant Cascadian@21:1/5 to Chris Lamb on Tue Apr 16 23:30:01 2024

On 2024-04-16, Chris Lamb wrote:

However, I think this first iteration of --hard-timeout time has a few
things that would need ironing out first, and potentially make it not
worth implementing:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

What about going the other direction ... starting with a very small
value for max-container-depth, and incrementally increasing it,
generating a report (or at least storing sufficient data to generate
one) in between each increment, so you always get some information, but essentially incrementally increase the resolution?

Or would that approach just be too inefficient?

(2) In fact, its easy to imagine that one would want to restart with
other restrictions as well: not just --max-container-depth. For
instance, excluding external commands like readelf and objdump that
you know to be slow.

Ah, yes, knowing the common time sinks would be tremendously helpful!

live well,
vagrant

-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCZh7soQAKCRDcUY/If5cW qgroAQD2akFqT28ZBD87gPe9ywc92nRhan1DG0JF09iSlK4fUQEAtvC2fknjx+g+ R+gHW+p+ADFnFrVXF/h+wo28D7Bh8As=
=TX3u
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Lamb@21:1/5 to Vagrant Cascadian on Thu Apr 18 13:50:01 2024

Vagrant Cascadian wrote:

On 2024-04-16, Chris Lamb wrote:

However, I think this first iteration of --hard-timeout time has a few
things that would need ironing out first, and potentially make it not
worth implementing:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

What about going the other direction ... starting with a very small
value for max-container-depth, and incrementally increasing it,
generating a report (or at least storing sufficient data to generate
one) in between each increment, so you always get some information, but essentially incrementally increase the resolution?

Or would that approach just be too inefficient?

This is probably a separate required best suited to another issue at
this point, but I do like the idea of being able to incrementally
increase the resolution over time. Depending on how it worked in practice, there should not be significant overhead in managing this
if, say, the commands that could not be run "in time" would have token placeholders internally that rendered to text in the output rather
than non-trivial/expensive binary diffs.

On the negative side though, I think this would still require a robust
way of killing long-running processes as outlined previously. But
moreover it would require a HUGE reworking of how diffoscope handles containers and recurses into nested structures in its tree-like style.
Indeed, thinking about it, this change would pretty much be exactly
the same work needed to make diffoscope run in parallel (!) which hopefully communicates both the scope of the changes that would be
needed to achieve this, and that making diffoscope run in parallel
also has other benefits. Anyway, mini brain dump over.

Regards,

--
o
⬋ ⬊ Chris Lamb
o o reproducible-builds.org 💠
⬊ ⬋
o

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Lamb@21:1/5 to Holger Levsen on Thu Apr 18 13:40:01 2024

Holger Levsen wrote:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

another option, --second-pass-max-container-depth or some such

(2) In fact, its easy to imagine that one would want to restart with
other restrictions as well: not just --max-container-depth. For
instance, excluding external commands like readelf and objdump that
you know to be slow.

yes, that's a good idea and IMO should be automatically implied for the
2nd pass or round or try.

It's definitely a "good idea" in the sense that I can definitely see
someone wanting to achieve that as an end result :)

Yet… upon thinking about it a bit, I don't think it is a good idea at
all for diffoscope to grow a bunch of new options or hardcoded
defaults for a second run. What (1) and (2) show here is that as soon
as a user would like to adjust these second pass options in any way,
then the whole interface becomes very unwieldy. Not only that, but
from the user's point of view it's neither flexible nor transparent as
well, especially when compared to "just" running diffoscope twice with different options. There's no "magic" there, if you see what I mean.

Can we implement running diffoscope twice on tests.r-b.org manually
first and see how that goes? I'm not 100% against the idea of implementing this in diffoscope eventually, but it would make a lot of
sense to try out the "manual" version first and gain some real-world experience first.

Regards,

--
o
⬋ ⬊ Chris Lamb
o o reproducible-builds.org 💠
⬊ ⬋
o

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun May 5 19:26:27 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 5 19:26:11 2024
  from Huddersfield, West Yorkshire via SSH
- Guest
  Sun May 5 16:29:05 2024
  from Shell via Raw
- Michal Wronka
  Sun May 5 15:55:28 2024
  from Wroclaw, Poland via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	03:06:30
Calls:	6,706
Calls today:	6
Files:	12,235
Messages:	5,350,192

Bug#1068890: diffoscope: --hard-timeout option

Who's Online

Recent Visitors

System Info