• Invalid UTF-8 test

    From Rob Swindell@1:103/705 to All on Sun Jul 7 15:45:49 2019
    UTF-8 decoder capability and stress test----------------------------------------Markus Kuhn <mkuhn@acm.org> - 1999-04-28This test text examines, how UTF-8 decoders handle various types ofcorrupted or otherwise interesting UTF-8 sequences. According to ISO10646-1, sections R.7 and 2.3c, a device receiving UTF-8 shallinterpret a "malformed sequence in the same way that it interprets acharacter that is outside the adopted subset".Test sequences (all enclosed in ""):Correct UTF-8 text (Greek word 'kosme'): "κόσμε"Correct 2-byte sequence (U+00000080): "€"Correct 3-byte sequence (U+00000800): "ࠀ"Correct 4-byte sequence (U+00010000): "𐀀"Correct 5-byte sequence (U+00200000): ""Correct 6-byte sequence (U+04000000): ""Correct 2-byte sequence (U+000007ff): "߿"Correct 3-byte sequence (U+0000ffff): "￿"Correct 4-byte sequence (U+001fffff): ""Correct 5-byte sequence (U+03ffffff): ""Correct 6-byte sequence (U+7fffffff):
    ""Correct 2-byte sequence (U+0000): ""Correct 3-byte sequence (U+0000): ""Correct 4-byte sequence (U+0000):
    ""Correct 5-byte sequence (U+0000): ""Correct 6-byte sequence (U+0000): ""Unexpected continuation byte (10000000):
    ""Another lonely continuation byte (10111111): ""Sequence of 2 unexpected continuation bytes: ""Sequence of 3 unexpected continuation bytes: ""Sequence of 4 unexpected continuation bytes: ""Sequence of 5 unexpected continuation bytes: ""Sequence of 6 unexpected continuation bytes: ""Sequence of 7 unexpected continuation bytes: ""Sequence of all 64 possible continuation bytes (10000000-10111111):" "Sequence of all 32 first bytes of 2-byte sequences (11000000-11011111),each followed by a space character:" "Sequence of all 16 first bytes of 3-byte se
    quences (11100000-11101111),each followed by a space character: " "Sequence of all 8 first bytes of 4-byte sequences (11110000-11110111),each followed by a space character: " "Sequence of all 4 first bytes of 5-byte sequences (11111000-11111011),each followed by a space character: " "Sequence of all 2 first bytes of 6-byte sequences (11111100-11111101),each followed by a space character: " "Impossible byte (11111110): ""Impossible byte (11111111): ""2-byte sequence with last byte missing: ""3-byte sequence with last byte missing: ""4-byte sequence with last byte missing: ""5-byte sequence with last byte missing: ""6-byte sequence with last byte missing: ""All these 5 sequences with last byte missing concatenated:""
    digital man

    This Is Spinal Tap quote #17:
    David St. Hubbins: It's such a fine line between stupid, and uh... and clever. Norco, CA WX: 79.0F, 54.0% humidity, 14 mph ESE wind, 0.00 inches rain/24hrs --- SBBSecho 3.07-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)