• a UTF-8 complaint

    From Rainer Weikusat@21:1/5 to All on Fri Dec 17 19:16:07 2021
    perl absolutely urgently needs a "stop fucking with my binary data,
    unicode morons, I need it like it is" pragma.

    Background: UNIX filenames are abitrary sequences of bytes whose values
    are neither 47 nor 0. And - coincidentally - UNIX filenames sometimes
    need to be handled in applications. Imagine that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From E. Choroba@21:1/5 to Rainer Weikusat on Sun Dec 19 14:42:12 2021
    On Friday, December 17, 2021 at 8:16:13 PM UTC+1, Rainer Weikusat wrote:
    perl absolutely urgently needs a "stop fucking with my binary data,
    unicode morons, I need it like it is" pragma.

    Background: UNIX filenames are abitrary sequences of bytes whose values
    are neither 47 nor 0. And - coincidentally - UNIX filenames sometimes
    need to be handled in applications. Imagine that.
    Can you provide an example? Perl doesn't change from bytes to unicode without being told to do so.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to E. Choroba on Mon Dec 20 15:17:55 2021
    "E. Choroba" <choroba@matfyz.cz> writes:
    On Friday, December 17, 2021 at 8:16:13 PM UTC+1, Rainer Weikusat wrote:
    perl absolutely urgently needs a "stop fucking with my binary data,
    unicode morons, I need it like it is" pragma.

    Background: UNIX filenames are abitrary sequences of bytes whose values
    are neither 47 nor 0. And - coincidentally - UNIX filenames sometimes
    need to be handled in applications. Imagine that.
    Can you provide an example? Perl doesn't change from bytes to unicode
    without being told to do so.

    Bug in my code in this case. That's why I cancelled the
    posting. Nevertheless, a way to mark something as "leave alone under all circumstances" would be helpful. I've had cases of silent conversion of
    binary data to UTF-8 because a string 'touched' another string with the
    utf8 flag set but without bytes outside of 0 - 127 in the past.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rainer Weikusat@21:1/5 to Rainer Weikusat on Mon Dec 20 15:33:22 2021
    Rainer Weikusat <rweikusat@talktalk.net> writes:
    "E. Choroba" <choroba@matfyz.cz> writes:
    On Friday, December 17, 2021 at 8:16:13 PM UTC+1, Rainer Weikusat wrote:
    perl absolutely urgently needs a "stop fucking with my binary data,
    unicode morons, I need it like it is" pragma.

    Background: UNIX filenames are abitrary sequences of bytes whose values
    are neither 47 nor 0. And - coincidentally - UNIX filenames sometimes
    need to be handled in applications. Imagine that.
    Can you provide an example? Perl doesn't change from bytes to unicode
    without being told to do so.

    Bug in my code in this case. That's why I cancelled the
    posting. Nevertheless, a way to mark something as "leave alone under all circumstances" would be helpful. I've had cases of silent conversion of binary data to UTF-8 because a string 'touched' another string with the
    utf8 flag set but without bytes outside of 0 - 127 in the past.

    Contrived example showing that:

    --------
    my $bin = "\x80\x98";

    my $un = "\N{U+16a2}";
    $un =~ s/^./a/;

    use Devel::Peek;

    Dump($bin);

    $bin .= $un; # should be possible to get a warning or error here
    Dump($bin);
    --------

    In the actual code where this bit me, one of the strings had the utf8
    flag set because it came (via DBD::Pg) from a postrgres table.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)