Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

locale/LC_CTYPE vs strcasecmp?

From Winston@21:1/5 to All on Tue Mar 26 06:24:31 2024

In FreeBSD 14.0-RELEASE:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

Is that a bug, does strcasecmp not, in fact, use the current
locale, or am I missing something?

TIA,
-WBE

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christian Weisgerber@21:1/5 to Winston on Tue Mar 26 19:47:03 2024

On 2024-03-26, Winston <wbe@UBEBLOCK.psr.com.invalid> wrote:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

Yes.
src/lib/libc/string/strcasecmp.c:

57 int
58 strcasecmp(const char *s1, const char *s2)
59 {
60 return strcasecmp_l(s1, s2, __get_locale());
61 }

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

UTF-8 characters are multibyte. You need to convert the strings
to wide characters and use wcscasecmp().

--
Christian "naddy" Weisgerber naddy@mips.inka.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Winston@21:1/5 to kindly on Wed Mar 27 11:16:40 2024

I originally posted:

The man page says strcasecmp_l() takes an explicit locale.
The implication is that strcasecmp() uses the current locale
(presumably as set by setlocale()).

to which Christian Weisgerber <naddy@mips.inka.de> kindly replied:

Yes.
src/lib/libc/string/strcasecmp.c:

57 int
58 strcasecmp(const char *s1, const char *s2)
59 {
60 return strcasecmp_l(s1, s2, __get_locale());
61 }

:-)

After calling setlocale(LC_ALL, "uk_UA.UTF-8"), I'm seeing that
strcasecmp() is not, in fact, case-independently matching non-ASCII
UTF-8 strings: it's case sensitive (the ASCII equivalent in this
case being that "Abc" isn't matching "abc").

UTF-8 characters are multibyte. You need to convert the strings
to wide characters and use wcscasecmp().

As one would expect and perfectly reasonable, but something (I forget
what now) led me to think that if strcasecmp accepted UTF-8 locales,
maybe it *would* be willing to, just operating one byte at a time
instead of two.

Thanks for confirming that, Christian. Onward to upgrading this
code that should have been doing that already ...
-WBE

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Thu Dec 26 05:34:50 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 05:25:03 2024
  from Sydney, Nsw via Telnet
- Guest
  Thu Dec 26 04:02:03 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 00:08:06 2024
  from Sydney, Nsw via Telnet
- Bob Worm
  Wed Dec 25 23:09:42 2024
  from Wales, Uk via Telnet
- Guest
  Wed Dec 25 19:36:50 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Keyop
  Wed Dec 25 16:24:41 2024
  from Huddersfield, West Yorkshire via SSH
- Daniel Garrod
  Wed Dec 25 16:22:01 2024
  from Cambridge, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	379
Nodes:	16 (2 / 14)
Uptime:	43:24:19
Calls:	8,141
Calls today:	4
Files:	13,085
Messages:	5,857,951