• cruft(-ng) and dh-cruft: handling and registering of dynamic files

    From Alexandre Detiste@21:1/5 to All on Sun Oct 23 01:10:01 2022
    Hi,

    I had been working on the cruft/cruft-ng package since 2014;
    there where a few setbacks along the years,
    like mlocate -> plocate & UsrMerge transitions,
    but it's alive and kicking, helping to find random
    lost files left behind by other packages
    and file bugs against those from time to time
    to get these glitches resolved.



    Recently I've been working a lot on it because I realized
    it would be the perfect solution to audit the disk space
    usage problems I'm facing at work.

    So I somewhat whipped up what I remembered from my own proposal https://wiki.debian.org/Cruft/purge and have now for myself a working "dh-cruft" than I can use to register dynamic files
    owned by some private .deb. Here "dh-cruft" is a must, I don't want to
    polute Debian with some random external data from downstream.

    This DebHelper works this way:
    * the "debian/cruft" list merely register the glob patterns,
    * and "debian/purge" list also an "rm -rf" stanza in postrm/purge.

    As a bonus there's now also a new "cpigs" command, working akin to
    "dpigs" from Debian Goodies to list the biggest volatile data producers.


    The plan now is to have a new option that dumps the whole
    matching result database as .json with individual file size
    for jq consumption or in my case Jupyter;
    this instead of implementing older requests (#291823 #487458 #527285).


    I know it's a very old unresolved subject that has been lurking forever
    here, but maybe it's the right time to look it up with a fresh view.

    My proposal for next steps:µ
    * gather your comments here
    * some review of dh-cruft (I don't know Perl)
    * get it in the NEW queue soon
    * have interested packages take part;
    for now cruft-ng ship it's own homegrown fallback database
    * (later): merge dh-cruft into DebHelper when it's basically "done"
    * (much much later): migrate some logic from DH to dpkg itself,
    with a more declarative packaging style;
    cruft-ng is already linked with the static library libdpkg
    and is bound to progress at the same pace.

    * there is still a performance problem in cruft-ng that I wish to improve.
    Basic profiling can be done by setting ELAPSED=1 env var.

    Greetings,

    Alexandre Detiste


    ./cpigs 30
    496720816 apt
    68957680 npm
    61846660 linux-image-5.19.0-1-amd64 (the initrd)
    61787431 linux-image-5.19.0-2-amd64
    53131401 dlocate
    36229735 aptitude
    19621198 dpkg
    17896745 plocate
    13559874 jupyter-nbextension-jupyter-js-widgets
    11982526 udev
    11870208 openjdk-11-jre-headless
    7257544 debconf
    5704857 smartmontools
    5685370 ttf-mscorefonts-installer
    5086033 linux-image-5.18.0-4-amd64 -> rc state
    4933502 grub-common
    3550208 qgis
    3523931 fontconfig
    3421312 ucf
    3231839 shared-mime-info
    3063016 locales
    2266947 libreoffice-common (files seen from explain/ucf)
    1901483 grub-pc-bin
    1565651 logrotate
    1258042 man-db
    1107968 ALTERNATIVES (I thought these were only symlinks ?)
    783313 popularity-contest
    763776 unattended-upgrades (du -b /var/log/unattended-upgrades/760422)
    657496 breeze-icon-theme
    625345 PYEXCEL (some pip3 automation)

    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEj23hBDd/OxHnQXSHMfMURUShdBoFAmNUd9IACgkQMfMURUSh dBqtgQ/+INssrId7NGgaaQBOx4SeP2EvxEtmb2PJfOGxXbfONdKHXBLDbqFlQGVf StvU1Mnqg9coChiB+JPKHiLROUX8NyWiNKQBAZoUEp4oTsyi7MfGNShcvUh8eArB WjW+/5crZsqur9sMwSDY62T/mBwUAkskT0DWM0rvFa/4+Lyg2wwvyLP/wYQ039iu aEALhnwCKn2CPQ3dx4ahL6X4UaKvidbjVVpeA9wEtq/Xw6Ql2AwObaZkwBnzDMw7 qHftSkVhanXPRnMEBZtj2zsOPWofv6oPKyTj80feRYG9/Ry+zlEZ8Cx+knIKlMmH EEIPDen8Rh2ogW4XfKaeR/23e3Zuj7xBVimnEwc27FGxsXM7HJDdur7+/R3r+Jcs 4dL86BeM24OUbAgQVqn5eVzXRTcKje9gOUKE5ZNwHU+atILCXgmMpsuKOGEKZsJj SA2rChyzqI9nzUPxhFqmaYwNmSv/wl5VX5C7BE9PrwVMASsldxNwOeU6aZ2576O/ sdmqSx8jngV0eZUHuWz30Kb3HxKfRxUiCGHac8rbgMikoMjCce3Rj3oErXKrwmnK TA5Autw4qcYzpRXm1Pqtg/8zzQDJdv8pcyuv+Kp+99ic9FiFEdH9UblbF6fD9qfE KLGXB8ktPmpVDZrSrnhydflgfPOw/KE/fkdHPJawisN8I1Mz2XU=
    =iIok
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Alexandre Detiste on Sun Oct 23 04:30:01 2022
    On Sun, 2022-10-23 at 01:08 +0200, Alexandre Detiste wrote:

    This DebHelper works this way:
    * the "debian/cruft" list merely register the glob patterns,
    * and "debian/purge" list also an "rm -rf" stanza in postrm/purge.

    As a bonus there's now also a new "cpigs" command, working akin to
    "dpigs" from Debian Goodies to list the biggest volatile data producers.

    Thank you for your work on this, being able to register files generated
    at install time by maintainer scripts or even at runtime by system
    maintainence tools to particular packages is a very useful feature for
    keeping all the files on a system more easily managed.

    Potentially it could also prompt users before removing packages that
    have registered data that won't be removed on purge, for example if a
    package creates at the sysadmin's request a dir in /srv to host a
    website, removing the package could warn about the directory. Or
    removing postgres with databases present could warn about those.

    I do worry about users removing files that they don't understand, based
    on feedback by cpigs/cruft-ng, but they do that already so... :)

    The plan now is to have a new option that dumps the whole
    matching result database as .json with individual file size
    for jq consumption or in my case Jupyter;
    this instead of implementing older requests (#291823 #487458 #527285).

    An ncdu or mc style interface (or plugins for those) to view cruft on a
    system sounds very useful in addition to the data export.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmNUpaQACgkQMRa6Xp/6 aaNn/w/8DFqEqxEUpz5YXSBnGEZ5uq92YFWcobKt2bzeuEZnRhUUz3WS4HhvgRtp 5x2joRuVy1eaHTifHB01k31bpcPDT+Kg4WSaTFiAPQcAEG4lZsyMtNLdzupCg1oU k83J7JWi1W2YZkwjKAEqdjwTcjSb/5Jx5x+21xMvCyo79JIzkCqE4EVqHstjMdnC IvP3UlFrqN+ydHy/JXQccfUaAQMQmKeV+ybVyFq0YxFRuJ1VGPell0hr5a9ZoIRl gPfep2SSuEOvQfIatGC0gFdJcigflMNEBLBGl0eK9YHX1Yk+FTxIGs4TiHt2lmPk 3yIJo6toArRyaGjSLbxFiMqp3oPSIDY5nqAMMqSw8UXKQi7GoaCGbGs8TztHIxgu dM0beJdsOOZzojI/8nYoxmn0p+OKLJwzh3hSE3gGEN4q93fBTKzUdQco87j93KBt VVp2oEUl0zRbALSYkCatoAynA0HVdhrdd+Qd1pXWtscVrjvjEm8tMr+42L3wz3Mr 2I4iH+oSqcanIkNeXvJTh9ak3vNBvcmSFgXCrvTMwTbq0xit4uteiKCCs7Zh0CSH 4QM7/ju3bRT25vtY14YOqibsKNFGCQiK4O9HAP5lR/UFyKujaVd3KqgUOG7SR2IY C0YLL2VlHe07fk1ST1Qb072iFmxIA3KRmcdmu7+nVCjW3v77VUI=
    =qzBl
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alexandre Detiste@21:1/5 to All on Sat Nov 5 12:00:01 2022
    Hi,

    Le dim. 23 oct. 2022 à 04:24, Paul Wise <pabs@debian.org> a écrit :
    Thank you for your work on this, being able to register files generated
    at install time by maintainer scripts or even at runtime by system maintainence tools to particular packages is a very useful feature for keeping all the files on a system more easily managed.

    The "cpigs" command has now a new "-C" command line switch
    to output the ownership of all system files (static+volatile) in a single .csv.

    I think this is something quite basic that can fill so many needs;
    but simply did not existed before.

    "apt-file" could be adapted to also transparently cache this information. End-users of this tool would get better results without
    having to change their habits.

    $ apt-file search /etc/subgid
    [nothing]
    $ cpigs -c | grep subgid
    /etc/subgid;base-passwd;f;1;19
    /etc/subgid-;base-passwd;f;1;0

    The plan is to keep this .csv output stable,
    whatever changes in the upstream dataflow:
    which is now mix of dpkg&diversions + alternatives
    + custom fallback scripts that know and replicate how
    UCF, logrotate, initramfs, grub, systemd, sysvinit
    manage volatile files inside their postinst/postrm.

    I do worry about users removing files that they don't understand, based
    on feedback by cpigs/cruft-ng, but they do that already so... :)

    I have seen some complaints about this online, and I agree...
    original "cruft" tool looks more like an unfinished Q&A tool akin to piuparts than an end-user tool for me.

    An ncdu or mc style interface (or plugins for those) to view cruft on a system sounds very useful in addition to the data export.

    It's implemented but the ncdu datamodel does not allow
    to insert the matched package name for the volatile files.

    It's still nice to use if you need to quickly identify
    where are the big volatile files piling up and take action.
    Already done in real life.

    Greetings

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)