• utf-8 string literal

    From Sascha Schwarz@21:1/5 to All on Mon Mar 14 06:49:01 2016
    { edited to shorten lines to ~70 characters. -mod }

    Hello all.

    Recently we were discussing if the following snippet is guaranteed to
    compiles on all conforming platforms.

    int main() {
    // wikipedia's example from https://en.wikipedia.org/wiki/UTF-8
    constexpr const char euro[] = u8"\u20ac";
    static_assert(
    sizeof euro == 4
    && euro[0] == static_cast<const char>(0b11100010)
    && euro[1] == static_cast<const char>(0b10000010)
    && euro[2] == static_cast<const char>(0b10101100),
    "Not utf-8.");
    }

    Looking at 2.3 (Basic charset) and 2.14.5 (String literals) we _think_
    so, but are not sure.

    This came up whilst implementing Adobe's glyphlist in C++.
    See https://github.com/adobe-type-tools/agl-aglfn


    --
    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.moderated. First time posters: Do this! ]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?ISO-8859-1?Q?=D6=F6_Tiib?=@21:1/5 to Sascha Schwarz on Mon Mar 14 17:10:29 2016
    On Monday, 14 March 2016 13:50:14 UTC+2, Sascha Schwarz wrote:
    Recently we were discussing if the following snippet is guaranteed to compiles on all conforming platforms.

    int main() {
    // wikipedia's example from https://en.wikipedia.org/wiki/UTF-8
    constexpr const char euro[] = u8"\u20ac";
    static_assert(
    sizeof euro == 4
    && euro[0] == static_cast<const char>(0b11100010)
    && euro[1] == static_cast<const char>(0b10000010)
    && euro[2] == static_cast<const char>(0b10101100),
    "Not utf-8.");
    }

    Looking at 2.3 (Basic charset) and 2.14.5 (String literals) we _think_
    so, but are not sure.

    Can you elaborate what makes you unsure?


    --
    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.moderated. First time posters: Do this! ]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sascha Schwarz@21:1/5 to All on Tue Mar 15 08:45:35 2016
    On Monday, 14 March 2016 23:20:14 UTC+1, Öö Tiib wrote:

    Can you elaborate what makes you unsure?

    It comes down to the difference between "\u20ac" and u8"\u20ac".

    My understanding is, that whilst there is no guarantee about the encoding of the
    former, the latter is encoded using utf-8, and the static_assert() holds.


    --
    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.moderated. First time posters: Do this! ]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)