• locale/LC_CTYPE vs strcasecmp?

    From Winston@21:1/5 to All on Tue Mar 26 06:24:31 2024
    In FreeBSD 14.0-RELEASE:

    The man page says strcasecmp_l() takes an explicit locale.
    The implication is that strcasecmp() uses the current locale
    (presumably as set by setlocale()).

    After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
    strcasecmp() is not, in fact, case-independently matching non-ASCII
    UTF-8 strings: it's case sensitive (the ASCII equivalent in this
    case being that "Abc" isn't matching "abc").

    Is that a bug, does strcasecmp not, in fact, use the current
    locale, or am I missing something?

    TIA,
    -WBE

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Weisgerber@21:1/5 to Winston on Tue Mar 26 19:47:03 2024
    On 2024-03-26, Winston <wbe@UBEBLOCK.psr.com.invalid> wrote:

    The man page says strcasecmp_l() takes an explicit locale.
    The implication is that strcasecmp() uses the current locale
    (presumably as set by setlocale()).

    Yes.
    src/lib/libc/string/strcasecmp.c:

    57 int
    58 strcasecmp(const char *s1, const char *s2)
    59 {
    60 return strcasecmp_l(s1, s2, __get_locale());
    61 }

    After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
    strcasecmp() is not, in fact, case-independently matching non-ASCII
    UTF-8 strings: it's case sensitive (the ASCII equivalent in this
    case being that "Abc" isn't matching "abc").

    UTF-8 characters are multibyte. You need to convert the strings
    to wide characters and use wcscasecmp().

    --
    Christian "naddy" Weisgerber naddy@mips.inka.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Winston@21:1/5 to kindly on Wed Mar 27 11:16:40 2024
    I originally posted:
    The man page says strcasecmp_l() takes an explicit locale.
    The implication is that strcasecmp() uses the current locale
    (presumably as set by setlocale()).

    to which Christian Weisgerber <naddy@mips.inka.de> kindly replied:
    Yes.
    src/lib/libc/string/strcasecmp.c:

    57 int
    58 strcasecmp(const char *s1, const char *s2)
    59 {
    60 return strcasecmp_l(s1, s2, __get_locale());
    61 }

    :-)

    After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
    strcasecmp() is not, in fact, case-independently matching non-ASCII
    UTF-8 strings: it's case sensitive (the ASCII equivalent in this
    case being that "Abc" isn't matching "abc").

    UTF-8 characters are multibyte. You need to convert the strings
    to wide characters and use wcscasecmp().

    As one would expect and perfectly reasonable, but something (I forget
    what now) led me to think that if strcasecmp accepted UTF-8 locales,
    maybe it *would* be willing to, just operating one byte at a time
    instead of two.

    Thanks for confirming that, Christian. Onward to upgrading this
    code that should have been doing that already ...
    -WBE

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)