• Standard parsable format for profiles/package.mask file

    Hi all

    I want to suggest a standard format for profiles/package.mask, for
    multiple reasons:

    1. Easier to write simple to understand mask or last-rites entries. When
    all entries are in similar format, the reader knows where to expect
    important information and such. Also easier for writer to convey all
    needed information.

    2. We can teach tools to parse it and render nicely, or help you fill
    the file. For example I've tried to implement a parser for
    packages.gentoo.org so it shows as nice as possible the message, see as
    example [1]. On the other hand, `pkgdev mask` [2] can help you fill the
    message (including bug number, last-rite until date, author & email
    line). Both of them mostly works, but when someone "breaks" the
    unofficial syntax, the tools fail sadly.

    This is why I want to recommend we create a mostly standard syntax, so
    we can all expect the same thing and have nice things.
    Also please note that for now I want to formalize the format only for profiles/package.mask file, and not the one inside all the different
    profiles. If you think we better apply to all of them, we can think on
    it separately please :)

    The current format is mostly acceptable, but let's tighten it. I will
    implement a pkgcheck check that will validate the format and error out
    if invalid.

    [1] https://packages.gentoo.org/packages/sys-fs/eudev
    [2] https://pkgcore.github.io/pkgdev/man/pkgdev/mask.html

    ===== "Formal" format =====

    Each entry is composed of 2 parts: "#"-prefixed explanation block and
    list of "${CATEGORY}/${PN}" packages. Entries are separated when a new explanation block starts (meaning first "#"-prefixed line after packages
    list). You may add newlines between packages in packages list.

    The first line of the "#"-prefixed explanation block must be of the
    format "${AUTHOR_NAME} <${EMAIL}> (${SINGLE_DATE})" when the date is of
    format YYYY-MM-DD, in UTC timezone.

    If this is a last-rite message, the last line must list the last-rite
    last date (removal date) and the last-rite bug number. You can also list
    other bugs relevant to the last-rite. So I think a format of: "Removal
    on ${REMOVAL_DATE}. Bug #NNNNNN, #NNNNNN." Where the bug list is comma
    and space separated, we have at least one space (" +" regex) between the removal date and bug list, and the date is of YYYY-MM-DD format.
    I prefer this line is separate (and not continuous of prefix message text).

    The explanation block itself can reference bugs, by matching the regex "[Bb]ugs? #\d+(, +#\d+)*" (For example: "bug #713106, #753134"). I think
    this is quite a simple one, but powerful enough for most.

    Lines with single newline between them (so no blank line between them)
    are considered as single paragraph continuum. If you want to start new paragraph, leave a blank line (still prefixed with #) - think similar to markdown. A line matching the last-rite line is always it's own paragraph.

    ========= Example =========

    After all of those rambling, here is an example (it will result in 3 paragraphs, 2 explanation and 1 last-rite finish):

    # Arthur Zamarin <arthurzam@gentoo.org> (2023-09-21)
    # Very broken, no idea why packaged, need to drop ASAP. The project
    # is done with supporting this package. See for history bug #667889.
    # As a better plan, you should migrate to dev-lang/perl, which has
    # better compatibility with dev-lang/ruby when used with dev-lang/lua
    # bindings.
    # Removal on 2023-10-21. Bug #667687, #667689.

    ==== Call for comments ====

    So how does it sound? I know it is easy to try to limit the syntax for
    me (since I"ll need to implement parsing of it), but I think this format
    above matches most of the currently used once, and the one created by
    `pkgdev mask`. But i needed, I'm open to improve it by comments.

    Should it be a GLEP, I don't think so? But I'm unsure about it. We do
    need to document it (for example header of that exact file).

    Arthur Zamarin
    Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)



