• Bug#874160: src:glibc: default locale to C.UTF-8

    From Aurelien Jarno@21:1/5 to Adam Borowski on Mon Sep 4 01:00:01 2017
    XPost: linux.debian.bugs.dist

    On 2017-09-03 18:49, Adam Borowski wrote:
    Package: src:glibc
    Version: 2.24-17
    Severity: wishlist
    Tags: patch

    Hi!
    Here's a simple patch set to change the default of setlocale(…, "") to C.UTF-8. This is a drastically smaller change than altering the meaning of "C" to mean "C.UTF-8" that upstream is mulling over -- it affects only programs that already have locale support, when the user fails to set any.

    Even is the change is small, that might still change the behavior of
    some programs, so I am not sure we want to diverge from upstream and
    other distributions here.

    One example comes to my mind: initializing a postgresql database
    cluster. When not using the --locale option this would cause the
    database to use a non C locale, which has significant performance
    impact.

    Aurelien

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam Borowski@21:1/5 to All on Sun Sep 3 20:00:01 2017
    XPost: linux.debian.bugs.dist

    This is a multi-part MIME message sent by reportbug.


    UGFja2FnZTogc3JjOmdsaWJjClZlcnNpb246IDIuMjQtMTcKU2V2ZXJpdHk6IHdpc2hsaXN0ClRh Z3M6IHBhdGNoCgpIaSEKSGVyZSdzIGEgc2ltcGxlIHBhdGNoIHNldCB0byBjaGFuZ2UgdGhlIGRl ZmF1bHQgb2Ygc2V0bG9jYWxlKOKApiwgIiIpIHRvCkMuVVRGLTguICBUaGlzIGlzIGEgZHJhc3Rp Y2FsbHkgc21hbGxlciBjaGFuZ2UgdGhhbiBhbHRlcmluZyB0aGUgbWVhbmluZyBvZgoiQyIgdG8g bWVhbiAiQy5VVEYtOCIgdGhhdCB1cHN0cmVhbSBpcyBtdWxsaW5nIG92ZXIgLS0gaXQgYWZmZWN0 cyBvbmx5CnByb2dyYW1zIHRoYXQgYWxyZWFkeSBoYXZlIGxvY2FsZSBzdXBwb3J0LCB3aGVuIHRo ZSB1c2VyIGZhaWxzIHRvIHNldCBhbnkuCgpJZiBub25lIG9mIExDX0FMTCwgTEFORyBub3IgTENf Q1RZUEUgYXJlIHNldCwgaW5zdGVhZCBvZiB0YWtpbmcgdGhpcyB0byBtZWFuCiJDIiB3ZSBhc3N1 bWUgIkMuVVRGLTgiLiAgVGhpcyBpcyBleHBsaWNpdGVseSBhbGxvd2VkIGJ5IFBPU0lYIChhbgoi aW1wbGVtZW50YXRpb24tZGVmaW5lZCBkZWZhdWx0IGxvY2FsZSIpLiAgc2V0bG9jYWxlKOKApiwg IkMiKSBvciBub3QgY2FsbGluZwppdCBhdCBhbGwgcmV0YWluIHRoZSBvbGQgbWVhbmluZ1sxXS4K ClRoaXMgaXMgdGhlIGFwcHJvYWNoIGFscmVhZHkgdGFrZW4gYnkgbXVzbC4KCkknbSBub3Qgc3Vi bWl0dGluZyB0aGlzIHVwc3RyZWFtIGZpcnN0IGFzIEMuVVRGLTggaXMgc3RpbGwgYSBEZWJpYW4t c3BlY2lmaWMKdGhpbmcuCgpUaGUgaW1wcm92ZW1lbnQgd291bGQgYmU6IGlmIGZvciBhbnkgcmVh c29uIHRoZSB1c2VyIGZhaWxzIHRvIHNldCB0aGUKbG9jYWxlLCBhIGRhZW1vbidzIHN0YXJ0dXAg c2NyaXB0IGlzIHRvbyBlYWdlciBjbGVhcmluZyBpdHMgZW52aXJvbm1lbnQsCmEgYnVpbGQgY2hy b290IGZhaWxzIHRvIGluaGVyaXQgZW52IHZhcnMsIGV0YyAtLSBpbiBhbGwgb2YgdGhlc2UgY2Fz ZXMgd2UnbGwKZmFsbCBiYWNrIHRvIGFuIFVURi04IGxvY2FsZS4gIE1ha2luZyBhIGxvY2FsZS1h d2FyZSBwcm9ncmFtIHVzZSAiQyIgaXMKc3RpbGwgZnVsbHkgcG9zc2libGUgdmlhIHNldHRpbmcg TENfQUxMPUMgYnV0IHdlIHdvbid0IHN1ZmZlciBmcm9tIG5vbi1VVEY4CmJ5IG9taXNzaW9uLgoK ClRoaXMgaXMgbW9zdGx5IGFuIG9uZS1saW5lIHBhdGNoICgxLzMpLCB0aGUgb3RoZXIgdHdvIHVw ZGF0ZSB0aGUgdGVzdHN1aXRlCigyLzMpIGFuZCBhbHRlciBoYXJkLWNvZGVkIG91dHB1dCBvZiAv dXNyL2Jpbi9sb2NhbGUgKDMvMykuCgoKTWVvdyEKClsxXS4gTWFraW5nICJDIiBiZWhhdmUgbGlr ZSAiQy5VVEYtOCIgd291bGQgYmUsIGFjY29yZGluZyB0byBteSByZWFkaW5nLApjb21wbGlhbnQg d2l0aCBib3RoIFBPU0lYLTIwMDhAMjAxNiBhbmQgQzExIGV4Y2VwdCBmb3IgYSBtaW5vciBpc3di bGFuaygpCndlaXJkbmVzcywgYnV0IHRoaXMgaXMgbm90IGEgcGFydCBvZiB0aGlzIGNoYW5nZS4K LS0gU3lzdGVtIEluZm9ybWF0aW9uOgpEZWJpYW4gUmVsZWFzZTogYnVzdGVyL3NpZAogIEFQVCBw cmVmZXJzIHVuc3RhYmxlLWRlYnVnCiAgQVBUIHBvbGljeTogKDUwMCwgJ3Vuc3RhYmxlLWRlYnVn JyksICg1MDAsICd1bnN0YWJsZScpLCAoNTAwLCAndGVzdGluZycpLCAoMTUwLCAnZXhwZXJpbWVu dGFsJykKQXJjaGl0ZWN0dXJlOiBhbWQ2NCAoeDg2XzY0KQpGb3JlaWduIEFyY2hpdGVjdHVyZXM6 IGkzODYKCktlcm5lbDogTGludXggNC4xMy4wLXJjNy1kZWJ1Zy11YnNhbi0wMDIyMC1nOTIyMjJi YWVhYzdkIChTTVAgdy82IENQVSBjb3JlcykKTG9jYWxlOiBMQU5HPUMuVVRGLTgsIExDX0NUWVBF PUMuVVRGLTggKGNoYXJtYXA9VVRGLTgpLCBMQU5HVUFHRT1DLlVURi04IChjaGFybWFwPVVURi04 KQpTaGVsbDogL2Jpbi9zaCBsaW5rZWQgdG8gL2Jpbi9kYXNoCkluaXQ6IHN5c3Zpbml0ICh2aWEg L3NiaW4vaW5pdCkK

    From 92d9938c6ba813afaf854d7bc12a9dc0c71371c3 Mon Sep 17 00:00:00 2001
    From: Adam Borowski <kilobyte@angband.pl>
    Date: Sun, 3 Sep 2017 00:26:47 +0200
    Subject: [PATCH 1/3] Default to C.UTF-8 on setlocale(..., "") if no env vars
    are set.

    This doesn't affects programs that are not prepared to handle arbitrary
    locales as those either don't call setlocale() at all or use setlocale(..., "C"); merely programs which would have used a proper locale had the user
    set it up.

    This provides a decent default when env var configuration is missing, in a
    way that's more robust than mucking with login defs and daemon startup
    scripts.

    A default locale other than "C" is allowed by POSIX; also at least musl
    uses an equivalent of C.UTF-8 already.
    ---
    locale/findlocale.c | 6 +++++-
    1 file changed, 5 insertions(+), 1 deletion(-)

    diff --git a/locale/findlocale.c b/locale/findlocale.c
    index 4cb9d5ea8a..2a12b4e808 100644
    --- a/locale/findlocale.c
    +++ b/locale/findlocale.c
    @@ -123,8 +123,12 @@ _nl_find_locale (const char *locale_path, size_t locale_path_len,
    + _nl_category_name_idxs[category]);
    if (!name_present (cloc_name))
    cloc_name = getenv ("LANG");
    + /* If no env vars are set, we're free to choose an
    + "implementation-defined default locale":
    + http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
    + */
    if (!name_present (cloc_name))
    - cloc_name = _nl_C_name;
    + cloc_name = "C.UTF-8";
    }

    /* We used to fall back to the C locale if the name contains a slash
    --
    2.14.1


    From 612dc7f67f93882b7acb2f035b1cc200ceb2e153 Mon Sep 17 00:00:00 2001
    From: Adam Borowski <kilobyte@angband.pl>
    Date: Sun, 3 Sep 2017 03:43:10 +0200
    Subject: [PATCH 2/3] Adj
  • From Adam Borowski@21:1/5 to Aurelien Jarno on Mon Sep 4 01:20:02 2017
    XPost: linux.debian.bugs.dist

    On Sun, Sep 03, 2017 at 11:54:03PM +0200, Aurelien Jarno wrote:
    On 2017-09-03 18:49, Adam Borowski wrote:
    Package: src:glibc
    Version: 2.24-17
    Severity: wishlist
    Tags: patch

    Hi!
    Here's a simple patch set to change the default of setlocale(…, "") to C.UTF-8. This is a drastically smaller change than altering the meaning of "C" to mean "C.UTF-8" that upstream is mulling over -- it affects only programs that already have locale support, when the user fails to set any.

    Even is the change is small, that might still change the behavior of
    some programs, so I am not sure we want to diverge from upstream and
    other distributions here.

    One example comes to my mind: initializing a postgresql database
    cluster. When not using the --locale option this would cause the
    database to use a non C locale, which has significant performance
    impact.

    In this case, this will change anyway when (if?) upstream goes forward with their version -- which sounds as if postgresql wants an explicit LC_ALL=C. Doesn't pg_createcluster inherit locale settings from the user who's
    invoking it (thus usually en_US.UTF-8 or whatever)? Thus, in the vast
    majority of uses, there's no change, merely a certain way to force the C
    locate (unset LC_ALL LANG LC_CTYPE LC_...) won't work anymore.

    As for diverging from upstream, lemme ask the guy behind the upstream
    proposal wiki page what's the inclusion status. You probably have a wee bit better idea than me about upstream's workings, though.


    Meow!
    --
    ⢀⣴⠾⠻⢶⣦⠀
    ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din ⠈⠳⣄⠀⠀⠀⠀

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aurelien Jarno@21:1/5 to Adam Borowski on Mon Sep 4 01:40:01 2017
    XPost: linux.debian.bugs.dist

    On 2017-09-04 00:13, Adam Borowski wrote:
    On Sun, Sep 03, 2017 at 11:54:03PM +0200, Aurelien Jarno wrote:
    On 2017-09-03 18:49, Adam Borowski wrote:
    Package: src:glibc
    Version: 2.24-17
    Severity: wishlist
    Tags: patch

    Hi!
    Here's a simple patch set to change the default of setlocale(…, "") to C.UTF-8. This is a drastically smaller change than altering the meaning of
    "C" to mean "C.UTF-8" that upstream is mulling over -- it affects only programs that already have locale support, when the user fails to set any.

    Even is the change is small, that might still change the behavior of
    some programs, so I am not sure we want to diverge from upstream and
    other distributions here.

    One example comes to my mind: initializing a postgresql database
    cluster. When not using the --locale option this would cause the
    database to use a non C locale, which has significant performance
    impact.

    In this case, this will change anyway when (if?) upstream goes forward with their version -- which sounds as if postgresql wants an explicit LC_ALL=C. Doesn't pg_createcluster inherit locale settings from the user who's
    invoking it (thus usually en_US.UTF-8 or whatever)? Thus, in the vast

    Or the root account when install postgresql using apt-get.

    majority of uses, there's no change, merely a certain way to force the C locate (unset LC_ALL LANG LC_CTYPE LC_...) won't work anymore.

    That's correct. My point there is that the usual unset command will work
    on all distributions except Debian.

    Also that's just an example of a behavior change, I am sure there are
    many more.

    As for diverging from upstream, lemme ask the guy behind the upstream proposal wiki page what's the inclusion status. You probably have a wee bit better idea than me about upstream's workings, though.

    As I said in my first, it's about diverging from upstream *and* from
    other distributions. If a few other distributions use your patch as the default, I am fine with that.

    Aurelien

    --
    Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)