• Bug#1064343: tput sgr0 adds uncalled-for codes

    From Adam Borowski@21:1/5 to Sven Joachim on Wed Feb 21 07:50:01 2024
    On Tue, Feb 20, 2024 at 07:41:42PM +0100, Sven Joachim wrote:
    On Tue, Feb 20, 2024 at 04:15:30PM +0800, Paul Wise wrote:
    $ tput sgr0 | hd
    00000000 1b 28 42 1b 5b 6d |.(B.[m|

    Here's the culprit. The code you asked for is "\e[0m" -- shortenable to non-canonical but valid "\e[m", which is the second half of tput's output.

    What you did not ask for, is "\e(B", which is not allowed in UTF-8 mode, and in non-Unicode world would switch to an ancient "US" charset.

    Maybe that is true for the Linux console, but we are talking about xterm here.

    It's not a property of the terminal, but of ECMA-35.

    And what "xterm" are you talking about? tput has no way to know the
    terminal on the other side, as the string TERM=xterm (and
    TERM=xterm-256color) applies to over a hundred different terminals using
    tens of different code bases. And you can't even blame their authors, as:
    * most Unices stopped maintaining their terminfo databases (eg. Solaris
    still hasn't learned about TERM=linux. Solaris is no longer relevant now
    but was relevant for most of that time frame.)
    * even if the databases were maintained, a new terminal would become useful
    only several years after it gets released (as the terminfo entry would
    need to be deployed on every box you might possibly ssh into, with
    failure mode being complete breakage of any terminfo-using program)
    Thus, putting aside historic terminals, there are only three TERM values:
    * linux
    * rxvt (used by its derivatives like aterm)
    * xterm (everything else)
    (Skipping decorations like -256color which most programs hard-code anyway.
    I thus had to implement 256 color fallbacks in the kernel in 2016; it seems that eg. 24-bit color is moving the same way.)

    Putting aside arguments if this code is allowed or not (eg. the author of Putty has strong feelings on the matter), it's very clearly not what you asked for, thus the real bug is on tput's side.

    Thus:
    "tput sgr0" should produce sgr0, not setusg0 sgr0.

    It does of course produce sgr0, i.e. it emits whatever escape sequence $TERM's terminfo entry declares as sgr0. In the case of xterm-256color, sgr0=\E(B\E[m.

    And it's that entry what's wrong. sgr0 means "\e[0m" (or "\e[m"); see
    eg. docs for real xterm: https://www.xfree86.org/current/ctlseqs.html

    The reason for including \E(B here is that sgr0 should cancel the
    effects of a previous smacs (start alternate character set) sequence and
    thus includes the rmacs (end alternate character set) escape sequence.

    Then it combines two completely different concepts in one label. SGR is
    for character attributes, G0/G1 are for encoding.

    People are relying on this behavior, see #595484 for instance.

    Seems like an XKCD 1172 case.

    Closing the bug, because everything works as intended.

    ...

    Well, I'm not going to fight a BTS war, but I don't agree with your
    decision.

    I'll work around this misbehaviour (as it's no extra work for me: I need
    to handle legitimately occuring G0/G1 changes anyway). Still, it is a bug
    even if its severity is negligible: thanks to PuTTY's author's stubborness
    no maintained software uses G0/G1 anymore.

    Thus, the only real fallout is bloating terminal output. It's still too
    slow on serial links or inferior terminals (I felt bad about Scaleway's
    web console just hours ago); saving three bytes per sgr0 is not much but
    it is a very frequently used sequence.


    Meow!
    --
    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢠⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din ⠈⠳⣄⠀⠀⠀⠀

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sven Joachim@21:1/5 to Adam Borowski on Thu Feb 22 18:30:02 2024
    I would like to add a few more points here, not to prolong the
    discussion but rather for future reference and for myself.

    On 2024-02-21 07:06 +0100, Adam Borowski wrote:

    On Tue, Feb 20, 2024 at 07:41:42PM +0100, Sven Joachim wrote:
    The reason for including \E(B here is that sgr0 should cancel the
    effects of a previous smacs (start alternate character set) sequence and
    thus includes the rmacs (end alternate character set) escape sequence.

    This has been the case from the early days of ncurses when ESR started
    to maintain the terminfo collection. On archive.debian.org I found a
    version on ncurses 1.9.4 with the terminfo.src file version 9.8[1].

    The ncurses manpages do not seem to explicitly mention this detail, but implicitly it appears a few times, for instance in termcap(3ncurses):

    ,----
    | termcap has nothing analogous to terminfo's set_attributes (sgr)
    | capability. One consequence is that termcap applications assume that
    | “me” (equivalent to terminfo's exit_attribute_mode (sgr0) capability)
    | does not reset the alternate character set. ncurses checks for, and
    | modifies the data shared with, the termcap interface to accommodate the
    | latter's limitation in this respect.
    `----

    Then it combines two completely different concepts in one label. SGR is
    for character attributes, G0/G1 are for encoding.

    You might think of it that way, but in (n)curses A_ALTCHARSET is just
    another video attribute, the concepts are not that different.

    Closing the bug, because everything works as intended.

    Well, I'm not going to fight a BTS war, but I don't agree with your
    decision.

    If you want to see changes, please propose them upstream. If Thomas
    follows your reasoning, great for you. Otherwise nothing is ever going
    to happen anyway, because there is no way I am going to deviate from
    upstream here (and patch sgr0 in a gazillion terminfo entries).

    Cheers,
    Sven


    1. https://archive.debian.org/debian/dists/Debian-0.93R6/source/devel/ncurses-1.9.4-0.tar.gz

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)