• Bug#1067092: man-db: =?UTF-8?Q?=E2=80=9Cman?= -K =?UTF-8?Q?--regex=E2=8

    From Manny@21:1/5 to All on Mon Mar 18 12:00:01 2024
    Package: man-db
    Version: 2.9.4-2
    Severity: normal
    Tags: upstream
    X-Debbugs-Cc: debbug.man-db@sideload.33mail.com

    Searching the whole DB for a whole word requires using --regex and
    then using word boundaries. So to find pages that reference the TZ
    environment variable, this *should* work (in principle):

    $ man -aK --regex '\<TZ\>'

    It appears to work because it finds many pages. But it misses the
    “tree” package (/usr/share/man/man1/tree.1.gz).

    $ zgrep 'TZ' /usr/share/man/man1/tree.1.gz
    \fBTZ\fP Timezone for timefmt output, see \fBstrftime\fP(3).

    As you can see, the nroff language intereferes with matching the
    regular expression as “TZ” is surrounded by code. Users of man-db
    obviously do not intend to have their regex matched against nroff
    code. Thus operations are being performed in the wrong order. The
    regular expression matching needs to happen on nroff-decoded text.

    $ zcat /usr/share/man/man1/tree.1.gz | nroff -man | grep '\<TZ\>'
    TZ Timezone for timefmt output, see strftime(3).

    * Workaround *

    One approach:

    $ find /usr/share/man/ -iname \*gz -exec zcat {} + | nroff -man | grep '\<'"$whole_word"'\>'

    In rare situations such as environment variable searches, case
    sensitivity can be leveraged:

    $ man -aKI --regex TZ

    -- System Information:
    Debian Release: 11.5
    APT prefers oldstable-updates
    APT policy: (990, 'oldstable-updates'), (990, 'oldstable-security'), (990, 'testing'), (990, 'oldstable')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 5.10.0-19-amd64 (SMP w/2 CPU threads)
    Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
    Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
    Shell: /bin/sh linked to /bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages man-db depends on:
    ii bsdextrautils 2.36.1-8+deb11u1
    ii debconf [debconf-2.0] 1.5.77
    ii dpkg 1.20.12
    ii groff-base 1.22.4-6
    ii libc6 2.31-13+deb11u5
    ii libgdbm6 1.19-2
    ii libpipeline1 1.5.3-1
    ii libseccomp2 2.5.1-1+deb11u1
    ii zlib1g 1:1.2.11.dfsg-2+deb11u2

    man-db recommends no packages.

    Versions of packages man-db suggests:
    ii apparmor 2.13.6-10
    ii elinks [www-browser] 0.13.2-1+b1
    ii firefox-esr [www-browser] 102.6.0esr-1~deb11u1
    pn groff <none>
    ii less 551-2
    ii lynx [www-browser] 2.9.0dev.6-3~deb11u1
    ii ungoogled-chromium [www-browser] 90.0.4430.212-1.sid1
    ii w3m [www-browser] 0.5.3+git20210102-6

    -- debconf information:
    man-db/install-setuid: false
    man-db/auto-update: true

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Manny on Mon Mar 18 12:10:01 2024
    On Mon, Mar 18, 2024 at 11:54:42AM +0100, Manny wrote:
    Searching the whole DB for a whole word requires using --regex and
    then using word boundaries. So to find pages that reference the TZ environment variable, this *should* work (in principle):

    $ man -aK --regex '\<TZ\>'

    It appears to work because it finds many pages. But it misses the
    “tree” package (/usr/share/man/man1/tree.1.gz).

    $ zgrep 'TZ' /usr/share/man/man1/tree.1.gz
    \fBTZ\fP Timezone for timefmt output, see \fBstrftime\fP(3).

    As you can see, the nroff language intereferes with matching the
    regular expression as “TZ” is surrounded by code. Users of man-db obviously do not intend to have their regex matched against nroff
    code. Thus operations are being performed in the wrong order. The
    regular expression matching needs to happen on nroff-decoded text.

    In principle I certainly agree that this would be more usable, but I've considered this in the past and given up as making it perform well would
    have been very difficult. There's a note about this in man(1), under
    the description of -K:

    Note that this searches the sources of the manual pages, not the
    rendered text, and so may include false positives due to things
    like comments in source files, or false negatives due to things
    like hyphens being written as "\-" in source files. Searching
    the rendered text would be much slower.

    --
    Colin Watson (he/him) [cjwatson@debian.org]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)