• Bug#1068891: coreutils: is join -t '' just comm -12?

    From =?UTF-8?Q?=D0=BD=D0=B0=D0=B1?=@21:1/5 to All on Sat Apr 13 01:10:01 2024
    Package: coreutils
    Version: 9.1-1
    Version: 9.4-3
    Severity: normal

    Dear Maintainer,

    POSIX.1-202x/D3:
    −t char
    Use character char as a separator, for both input and output. Every appearance of
    char in a line shall be significant. When this option is specified, the collating
    sequence shall be the same as sort without the −b option.
    so obviously allowing -t '' is an extension.

    Manual:
    -t CHAR
    use CHAR as input and output field separator

    Important: FILE1 and FILE2 must be sorted on the join fields. E.g.,
    use "sort -k 1b,1" if 'join' has no options, or use "join -t ''" if
    'sort' has no options. Note, comparisons honor the rules specified by
    'LC_COLLATE'. If the input is not sorted and some lines cannot be
    joined, a warning message will be given.

    So given
    $ cat f1
    row1 f1 1
    urow1 f1 2
    $ cat f2
    row1 f2 1
    urow2 f2 2
    which are stable against both sort and sort -k 1b,1
    $ join f?
    row1 f1 1 f2 1
    $ join f? -t ' '
    row1 f1 1 f2 1
    is all as expected.

    But
    $ join f? -t ''
    returns empty. What would empty -t mean, anyway?
    The empty string can either be found at every position
    (clearly not the case here, otherwise this'd be joined on r and u)
    or at no positions, so
    $ cat g1
    row1
    urow1
    $ cat g2
    row1
    urow2
    $ join g? -t ''
    row1
    which is, well
    $ comm g? -12
    row1

    Somehow I don't feel like this is a good recommendation?

    Best,
    наб

    -- System Information:
    Debian Release: 12.4
    APT prefers stable-updates
    APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable-debug'), (500, 'stable')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 6.1.0-12-amd64 (SMP w/24 CPU threads; PREEMPT)
    Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_FIRMWARE_WORKAROUND, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
    Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages coreutils depends on:
    ii libacl1 2.3.1-3
    ii libattr1 1:2.5.1-4
    ii libc6 2.36-9+deb12u4
    ii libgmp10 2:6.2.1+dfsg1-1.1
    ii libselinux1 3.4-1+b6

    coreutils recommends no packages.

    coreutils suggests no packages.

    -- no debconf information

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmYZvNMACgkQvP0LAY0m WPGXlBAAs6m8Wxt5NUz7BKrXmLBlSUEZIqNi08rYzHHnRMsONaJZYpW092zJCkb9 HPV66MUk52AjHxAh1GqtGPBoS2rrIuqkpAK1V2CmIx/HFbD5GPDOAvbmjrWK/Olt 5IXFfjwtz8e1tGTGAFK/69TVTVLejMC+0tNxPKfLcNaWrklKc3lrASjsGNk7D2cl VoiRLVhyBWBxlmfbhONCwB4iPJ2Io4YHQGBdJZRkYXij5/ZltOm8zQAAkvJ9KSBz u7zj1WlAZ8at5vBMSH9uHF0vdJxY6Qq6SW5pi1P1oqR/AIrgsZGhlnr/Y4Y11SpW P252Fg7SuD6Or/bxkzFbVGGmT+V77BeQF0Oikh+kMDscre2TJPCDj7FHnkMMsmHm EAU79c5zllPr4jFP+hOO8/90QRLn0FPNVzs4Y/uWGCQm+nRPQIOq0jnU2t6Log6s 96wNTzQ4HiSV8pN4g4kLqwvjeQ6XQ8fR0RADhMwaW2xS8XefN9J4xYatU9AzfdxL yYQk7U/R6YMP1PVGvLj1rbvJl5yW0qqTU631zef8MWA3srCq4FxisyByViKIaPhq e7p47YOOucgl/qGM28zv3ep0EN0mYKSkGPWQsLyb+kkSpj0GCARaanCjq5KXKiVh EC8AujMpJcFv8pMOihfc+v+J6DlsYZxcRt//dy7Rrv7PrfoiqpI=
    =hFFh
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?P=C3=A1draig?= Brady@21:1/5 to All on Sat Apr 13 02:00:01 2024
    On 12/04/2024 23:59, наб wrote:
    Package: coreutils
    Version: 9.1-1
    Version: 9.4-3
    Severity: normal

    Dear Maintainer,

    POSIX.1-202x/D3:
    −t char
    Use character char as a separator, for both input and output. Every appearance of
    char in a line shall be significant. When this option is specified, the collating
    sequence shall be the same as sort without the −b option.
    so obviously allowing -t '' is an extension.

    Manual:
    -t CHAR
    use CHAR as input and output field separator

    Important: FILE1 and FILE2 must be sorted on the join fields. E.g.,
    use "sort -k 1b,1" if 'join' has no options, or use "join -t ''" if
    'sort' has no options. Note, comparisons honor the rules specified by
    'LC_COLLATE'. If the input is not sorted and some lines cannot be
    joined, a warning message will be given.

    So given
    $ cat f1
    row1 f1 1
    urow1 f1 2
    $ cat f2
    row1 f2 1
    urow2 f2 2
    which are stable against both sort and sort -k 1b,1
    $ join f?
    row1 f1 1 f2 1
    $ join f? -t ' '
    row1 f1 1 f2 1
    is all as expected.

    But
    $ join f? -t ''
    returns empty. What would empty -t mean, anyway?
    The empty string can either be found at every position
    (clearly not the case here, otherwise this'd be joined on r and u)
    or at no positions, so
    $ cat g1
    row1
    urow1
    $ cat g2
    row1
    urow2
    $ join g? -t ''
    row1
    which is, well
    $ comm g? -12
    row1

    Somehow I don't feel like this is a good recommendation?

    Well sort with no options operates on the whole line.
    So the corresponding join -t '' operates on the whole line.

    cheers,
    Pádraig

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)