• Bug#868654: Combining Unicode Mark-Nonspacing are classified as [:punct

    From Santiago R.R.@21:1/5 to All on Mon Jul 17 12:30:02 2017
    XPost: linux.debian.bugs.dist

    Source: glibc
    Version: 2.24-12
    Severity: minor
    Control: block 662629 by -1

    Hi,

    There is an issue on how glibc classifies the Unicode Mark-nonspacing
    category, that should be maybe [[:alpha:]] instead of [[:punct:]]. This
    was identified by the bug reported to grep:
    https://bugs.debian.org/662629

    You can test it using the U+0301 acute accent:

    $ echo árbol | grep -o '[[:alpha:]]*'
    a
    rbol

    This is also the opinion by grep's upstream about it:

    "Surely this is a glibc bug, not a grep bug. Grep is just following the character classification of glibc. I can reproduce the problem by
    compiling and running the attached program, which uses only glibc (not
    grep). This program exits with status 1, whereas you want it to exit
    with status 0. So I suggest filing a glibc bug report."

    combining.c is attached to this mail.

    Cheers,

    -- Santiago

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)