• Re: Mapping Reproducibility Bug Reports to Commits

    From Andrey Rahmatullin@21:1/5 to Muhammad Hassan on Sun Nov 14 20:40:02 2021
    On Sun, Nov 14, 2021 at 05:53:24PM +0000, Muhammad Hassan wrote:
    Hi all,

    I am a researcher at the University of Waterloo, conducting a project to study reproducibility issues in Debian packages.

    The first step for me is to link each Reproducibility-related bug at this link: https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-builds@lists.alioth.debian.org to the corresponding commit that fixed the bug.
    This task implies that packaging for all Debian packages is stored in some
    VCS, those repos are public, specific commits are marked as fixing
    specific bugs and probably, depending on the specifics of the task, that commits are granular. None of this is true. The best you can do is find
    which uploads closed those bugs, using their "fixed in version" data.


    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmGRZTgtFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh 7/kP/3Gqm67Kba03/XENo5d0f1J4QD3I9RfXSIf8bsSFPa8vCkAnxNv7YomPvQmv jIgIUN5QX3dRXoPTI7982CzU9ofsXesuZArcRCTKkyLlJe5gfzeelw1sbUCSSesh 9bPmBZ2dlllwcdDf3GwKNR121FiBbNrevF8gVhoWefUO6PUjiz0PYGiYykNqptlu r9vfknUI6R4TD9MlOjM5bFKSzrEfwMmsW8LL5PeWMbk+ZjsHzkNxbsfSJRQLCtow r2VGIl7uJwimU7w8ejfFyTHHFGyfrs0dBQXAWouH2rVJ15wYjlUPGSHSOiKp5c0U UUAzWb6ybVjqI8fv7Hu/R40PLTV6zfrTNTL+rsd4brnq2U7BHPTVTCt6Gc+NYqjK MTjFiSAYbBSZFR3nZXclIu2I1s9d196a+hgsp7EhIdhfiz4RPM9efRWyv+e79Pm3 wytc6Tlqpaic2/Q42iS58dM7k/opMDC0Clj3UnaumvzHl+qKwFOq2Qnjw4kN9dUp MVnGhk9LalwyOLXpVVzMy6JTi/bOloKS3H8KPw8gm0zYPmkeWuvRymN40LsSLIHc yKoHQN9cIZ9+DIiRjXPpN5hIiFHq8TYjGVXEQ9C9Zg0areTwfyPb2mb2c/q2TK9Y dve/Pt7guv/s17y05HPM1DSv7OfB4Yd9alIfLtKctp1l+DAI
    =yLaC
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From peter green@21:1/5 to All on Mon Nov 15 05:40:01 2021

    I am a researcher at the University of Waterloo, conducting a project to study reproducibility issues in Debian packages.

    The first step for me is to link each Reproducibility-related bug at this link: https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-builds@lists.alioth.debian.org to the corresponding commit that fixed the bug.

    However, I am unable to find an explicit way of doing so programatically. Please assist.

    There is no explicit link.

    Most (but not all) debian packages are maintained in a VCS and there are fields in the source package
    that identify the location and type of the VCS (almost certainly git nowadays), but there are multiple
    different workflows used (git-buildpackage is the most common and normally uses a "patches-unapplied"
    git tree, but there is also dgit which uses a "patches applied" git tree. Git trees may or may not
    contain the upstream source. At least one language community uses a system where the git tree stores
    files that are used to generate the Debian packaging rather than the final Debian packaging itself.

    Also maintainer practices for strucuring commits vary, some maintainers update the changelog at the same
    time as making the actual changes, others update the changelog in a batch later.

    Sometimes bugs aren't even closed from the changelog at all but instead are closed by the maintainer
    after the upload. Particularly if the maintainer is not sure whether a change will fix the bug.

    With all that said, it's probably doable to develop heuristics that map bug numbers to commits in most
    cases, an outline might be.

    * Check if the package has a VCS and the relavent changelog can be found in said VCS, if there is no VCS give up and reffer the bug for human attention.
    * Map the bug number to a changelog line (if there is no such mapping, give up and reffer the bug for human attention)
    * Determine which commit added the changelog line (e.g. with git blame), see if there are actual code changes in that commit,
    * if so take it as the probable commit, if not then search backwards a bit for a commit message that matches
    the changelog line.

    Another option having guessed a range of commits from the changelog and/or from comparing the VCS to the
    source packages may be to run a bisection, this would likely require some effort to detect what workflow
    is in use though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johannes Schauer Marin Rodrigues@21:1/5 to All on Mon Nov 15 10:40:02 2021
    Hi Muhammad,

    others already explained how packaging VCS are (sadly) basically a free-for-all in Debian and that you will probably not get anything better than some heuristics. I wanted to add some more ideas to the ones that were already presented. So in addition to what was already said you can also try any of the following:

    1. If the packaging is on salsa and the commit contains a "closes: XXXX" line,
    then the bug will contain a message like this one which will let you
    directly identify the commit that fixed the bug:
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=907352#39

    2. If the changelog entry only closes the reproducible bug and nothing else,
    then you can use snapshot.d.o and then debdiff the version that closed the
    bug with the version before that. This method will work even for packages
    that are not using any VCS.

    3. If the changelog closes multiple bugs but also points out *who* closed the
    reproducible bug and that person changed nothing else according to
    d/changelog then it's also easy to find the commit. This of course only
    works if the package does use a VCS and if your tools can detect and
    understand the specific packaging style that was used.

    4. There were GSoC projects involving reproducible builds. For example Maria
    Valentina Marin Rodrigues contributed back in 2015 and if you find a commit
    of her in packaging repos it will be fixing a reproducible builds bug. There
    might be more GSoC students for which you can apply a similar approach.

    Just my 2c.

    Thanks!

    cheers, josch
    --============== 22898903207270277=MIME-Version: 1.0 Content-Transfer-Encoding: 7bit
    Content-Description: signature
    Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmGSJ6EACgkQ8sulx4+9 g+H4nBAAkQNWzUhKhrBFVzpBqUWceLnu38nvSVXCYOaqPIeD9ZAE1toO7sIAU398 DKPacGdA3Fb0ml964J06dK57iIZxpPiaXsr0hElMv0psY674g45wGlhle0hT3Lbt KegMawpROTSEYE3Ei68Stw0qoQxVxsXOuGIAysQVoHByiK5QGIQrmd3laTmK6ilH RyJ768VJWRb/XA3qQ25s4Ril7vyTRzox+z2tzPxM5IGD7AkdR+il8o+S2k7qR2ij vSX6ctwCQKSFV7qUacNli+/oM8JS9XYll3snuAPGnbsVRQynL/MfxqqqmpL/zCJr bq7V8/pv7Lum1Fwe0UNV7PmEUED/RDIjDSkKNMabNnWsH/cKjvfU50bO4R+q6vcM 6UR+MUFuWsiXjf5RVHN41YokT/OcqD9P6s7b7bFgkwv7RPSZ2y8DkLxEaWfND17j IMogzXqj/yCP8ei9AHE62guwPITUipXoshzSigfPu0dQVQd5OgtDDnSSPXzeiIDk 3CE/IJCcoLvt5lxIL+EqP0M04giIWrN8sF08fGxugtx6BbfZJh6cqRLv/v13RayL cQAkUlgbW5L0LeAidgLT5ucbeO2u2bbbZ6UwxyIXVG7Rjdp/OK57nhM3Gg7mUBcC hP3hm7CdoEiqL6CAFHr88TwFHOqF2tvmD6/GGhn8TVaWWizBlqI=
    =4lQq
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)