• How to convert a raw string r'\xdd' to '\xdd' more gracefully?

    From Jach Feng@21:1/5 to All on Tue Dec 6 18:23:07 2022
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MRAB@21:1/5 to Jach Feng on Wed Dec 7 03:01:12 2022
    On 2022-12-07 02:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    You could try this:

    s0 = r'\x0a'
    ast.literal_eval('"%s"' % s0)
    '\n'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Tue Dec 6 19:37:54 2022
    MRAB 在 2022年12月7日 星期三上午11:04:43 [UTC+8] 的信中寫道:
    On 2022-12-07 02:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    You could try this:

    s0 = r'\x0a'
    ast.literal_eval('"%s"' % s0)
    '\n'
    Not work in my system:-(

    Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    s0 = r'\x0a'
    import ast
    ast.literal_eval("%s" % s0)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
    File "<unknown>", line 1
    \x0a
    ^
    SyntaxError: unexpected character after line continuation character

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to Jach Feng on Tue Dec 6 22:55:52 2022
    On 12/6/2022 9:23 PM, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach

    I'm not totally clear on what you are trying to do here. But:

    s1 = r'\xdd' # s1[2:] = 'dd'
    n1 = int(s1[2:], 16) # = 221 decimal or 0xdd in hex
    # So
    chr(n1) == 'Ý' # True
    # and
    '\xdd' == 'Ý' # True

    So the conversion you want seems to be chr(int(s1[2:], 16)).

    Of course, this will only work if the input string is exactly four
    characters long, and the first two characters are r'\x', and the
    remaining two characters are going to be a hex string representation of
    a number small enough to fit into a byte.

    If you know for sure that will be the case, then the conversion above
    seems to be about as simple as it could be. If those conditions may not
    always be met, then you need to work out exactly what strings you may
    need to convert, and what they should be converted to.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Tue Dec 6 23:40:12 2022
    Thomas Passin 在 2022年12月7日 星期三中午12:51:32 [UTC+8] 的信中寫道:
    On 12/6/2022 9:23 PM, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach
    I'm not totally clear on what you are trying to do here. But:

    s1 = r'\xdd' # s1[2:] = 'dd'
    n1 = int(s1[2:], 16) # = 221 decimal or 0xdd in hex
    # So
    chr(n1) == 'Ý' # True
    # and
    '\xdd' == 'Ý' # True

    So the conversion you want seems to be chr(int(s1[2:], 16)).

    Of course, this will only work if the input string is exactly four characters long, and the first two characters are r'\x', and the
    remaining two characters are going to be a hex string representation of
    a number small enough to fit into a byte.

    If you know for sure that will be the case, then the conversion above
    seems to be about as simple as it could be. If those conditions may not always be met, then you need to work out exactly what strings you may
    need to convert, and what they should be converted to.
    Thank you for reminding that the '0x'+ in the to1byte() definition is redundant:-)

    Just not sure if there is a better way than using chr(int(...)) to do it.
    Yes, for this specific case, slice is much simpler than re.sub().

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roel Schroeven@21:1/5 to All on Wed Dec 7 09:42:18 2022
    Op 7/12/2022 om 4:37 schreef Jach Feng:
    MRAB 在 2022年12月7日 星期三上午11:04:43 [UTC+8] 的信中寫道:
    On 2022-12-07 02:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    You could try this:

    s0 = r'\x0a'
    ast.literal_eval('"%s"' % s0)
    '\n'
    Not work in my system:-(

    Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    s0 = r'\x0a'
    import ast
    ast.literal_eval("%s" % s0)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
    File "<unknown>", line 1
    \x0a
    ^
    SyntaxError: unexpected character after line continuation character
    You missed a pair of quotes. They are easily overlooked but very
    important. The point is to wrap your string in another pair of quotes so
    it becomes a valid Python string literal in a Python string which can
    then be passed to ast.literal_eval(). Works for me:

    In [7]: s0 = r'\x0a'

    In [8]: import ast

    In [9]: ast.literal_eval('"%s"' % s0)
    Out[9]: '\n'

    --
    "Experience is that marvelous thing that enables you to recognize a
    mistake when you make it again."
    -- Franklin P. Jones

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Jach Feng on Wed Dec 7 10:49:04 2022
    Jach Feng <jfong@ms4.hinet.net> writes:
    s0 = r'\x0a'
    At this moment it was done by
    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
    But, is it that difficult on doing this simple thing?

    bytes.fromhex( s0[ 2: ])

    This has not the same type as the result of "chr",
    but maybe "bytes" is even more appropriate for a byte.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Otten@21:1/5 to Jach Feng on Wed Dec 7 22:12:27 2022
    On 07/12/2022 03:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Wed Dec 7 17:16:09 2022
    Roel Schroeven 在 2022年12月7日 星期三下午4:42:48 [UTC+8] 的信中寫道:
    Op 7/12/2022 om 4:37 schreef Jach Feng:
    MRAB 在 2022年12月7日 星期三上午11:04:43 [UTC+8] 的信中寫道:
    On 2022-12-07 02:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    You could try this:

    s0 = r'\x0a'
    ast.literal_eval('"%s"' % s0)
    '\n'
    Not work in my system:-(

    Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    s0 = r'\x0a'
    import ast
    ast.literal_eval("%s" % s0)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
    File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
    File "<unknown>", line 1
    \x0a
    ^
    SyntaxError: unexpected character after line continuation character
    You missed a pair of quotes. They are easily overlooked but very
    important. The point is to wrap your string in another pair of quotes so
    it becomes a valid Python string literal in a Python string which can
    then be passed to ast.literal_eval(). Works for me:

    In [7]: s0 = r'\x0a'

    In [8]: import ast

    In [9]: ast.literal_eval('"%s"' % s0)
    Out[9]: '\n'

    --
    "Experience is that marvelous thing that enables you to recognize a
    mistake when you make it again."
    -- Franklin P. Jones
    Thank you for notifying me. I did notice those ''' in MRAB's post, but didn't figure out what it is at that time:-(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Wed Dec 7 17:17:22 2022
    Peter Otten 在 2022年12月8日 星期四清晨5:17:59 [UTC+8] 的信中寫道:
    On 07/12/2022 03:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?
    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'
    Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Thu Dec 8 00:56:42 2022
    Jach Feng 在 2022年12月7日 星期三上午10:23:20 [UTC+8] 的信中寫道:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach
    I find another answer on the web.

    s0 = r'\x0a'
    s0.encode('Latin-1').decode('unicode-escape')
    '\n'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Weatherby,Gerard@21:1/5 to All on Thu Dec 8 13:23:42 2022
    I$B!G(Bm not understanding the task. The sample code given is converting the input r$B!G(B\x0a$B!G(B to a newline, it appears.


    import re


    def exam(z):
    print(f"examine {type(z)} {z}")
    for c in z:
    print(f"{ord(c)} {c}")

    s0 = r'\x0a'

    def to1byte(matchobj):
    return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
    exam(s0)
    exam(s1)

    ---
    examine <class 'str'> \x0a
    92 \
    120 x
    48 0
    97 a
    examine <class 'str'>

    10

    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
    Date: Wednesday, December 7, 2022 at 9:27 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    Peter Otten $B:_(B 2022$BG/(B12$B7n(B8$BF|(B $B@14|;M@6Zo(B5:17:59 [UTC+8] $BE*?.CfUmF;!'(B
    On 07/12/2022 03:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?
    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'
    Thank you. What I real
  • From Thomas Passin@21:1/5 to Gerard on Thu Dec 8 09:12:19 2022
    The original post started out with r'\x0a' but then talked about '\xdd'.
    I assumed that there was a pattern here, a raw string containing "\x"
    and two more characters, and made a suggestion for converting any string
    with that pattern. But the OP was very unclear what the task really
    was, so here we all are, making a variety of guesses.

    On 12/8/2022 8:23 AM, Weatherby,Gerard wrote:
    I’m not understanding the task. The sample code given is converting the input r’\x0a’ to a newline, it appears.


    import re


    def exam(z):
    print(f"examine {type(z)} {z}")
    for c in z:
    print(f"{ord(c)} {c}")

    s0 = r'\x0a'

    def to1byte(matchobj):
    return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
    exam(s0)
    exam(s1)

    ---
    examine <class 'str'> \x0a
    92 \
    120 x
    48 0
    97 a
    examine <class 'str'>

    10

    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
    Date: Wednesday, December 7, 2022 at 9:27 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    Peter Otten 在 2022年12月8日 星期四清晨5:17:59 [UTC+8] 的信中寫道:
    On 07/12/2022 03:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?
    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'
    Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)
    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
    python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Otten@21:1/5 to Jach Feng on Thu Dec 8 17:20:27 2022
    On 08/12/2022 02:17, Jach Feng wrote:
    Peter Otten 在 2022年12月8日 星期四清晨5:17:59 [UTC+8] 的信中寫道:
    On 07/12/2022 03:23, Jach Feng wrote:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?
    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'
    Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)

    Hm, codecs.decode() does work for arbitrary escapes. It will produce the
    same result for r"\xdd"-type raw strings where d is in the range 0...F.
    It will also convert other escapes like

    codecs.decode(r"\t", "unicode-escape")
    '\t'
    codecs.decode(r"\u5728", "unicode-escape")
    '在'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Thu Dec 8 11:12:30 2022
    PS C:\humour> py38 sysargwithliteral.py abc\x80œ cp1252
    abc€œ
    PS C:\humour> py38 sysargwithliteral.py abc\xe1œ€ cp1253
    abcαœ€
    PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp1253
    abcαβγz
    PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp437
    abcßΓπz
    PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp850
    abcßÔÒz
    PS C:\humour> py38 sysargwithliteral.py abc\u03b1\u03b2\u03b3z unicode abcαβγz
    PS C:\humour> py38 sysargwithliteral.py abc\u03b1\u03b2\u03b3z unicode abcαβγz
    PS C:\humour> py38 sysargwithliteral.py abc\\ cp1252
    abc\

    Anyway. Interpreting a command line may lead to a non sense.
    Ditto for piping.

    PS C:\humour> py38 sysargwithliteral.py x:\xffb.html cp1252
    x:ÿb.html
    PS C:\humour> py38 sysargwithliteral.py x:\\xffb.html cp1252
    x:\xffb.html
    PS C:\humour>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Thu Dec 8 13:08:39 2022
    Le mercredi 7 décembre 2022 à 22:17:59 UTC+1, Peter Otten a écrit :

    import codecs
    codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
    'hello\n'

    Rejected.

    It works by chance correctly only because you are using ascii.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Thu Dec 8 18:05:02 2022
    Jach Feng 在 2022年12月7日 星期三上午10:23:20 [UTC+8] 的信中寫道:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach
    The whold story is,

    I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
    But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

    I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

    At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

    def to1byte(matchobj):
    ....return matchobj.group(0).encode().decode("unicode-escape")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Weatherby,Gerard@21:1/5 to All on Fri Dec 9 13:29:06 2022
    That$B!G(Bs actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you$B!G(Bre using. (e.g. https://www.alt-codes.net).

    Here$B!G(Bs a sample program. The dashes are to help show the boundaries of the string

    #!/usr/bin/env python3
    import argparse
    import logging


    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('data')
    args = parser.parse_args()
    print(f'Input\n: -{args.data}- length {len(args.data)}')
    for c in args.data:
    print(f'{ord(c)} ',end='')
    print()


    Using bash on Linux:

    ./cl.py '^M
    '
    Input
    -
    - length 3
    13 32 10


    From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
    Date: Thursday, December 8, 2022 at 9:31 PM
    To: python-list@python.org <python-list@python.org>
    Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
    *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

    Jach Feng $B:_(B 2022$BG/(B12$B7n(B7$BF|(B $B@14|;0>e8a(B10:23:20 [UTC+8] $BE*?.CfUmF;!'(B
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach
    The whold story is,

    I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
    But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

    I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

    At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

    def to1byte(matchobj):
    ....return matchobj.group(0).encode().decode("unicode-escape")
    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
    list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Fri Dec 9 07:41:06 2022
    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 a
    b c
    €uro
    z

    PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

    PS C:\humour> licp($a)
    a U+0061
    b U+0062
    U+0009
    c U+0063
    € U+20AC
    u U+0075
    r U+0072
    o U+006F
    x U+0078
    U+0008
    z U+007A

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
    a
    b c
    €uro
    z
    end😊

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
    a
    b c
    €uro
    z

    PS C:\humour>
    Note: In a terminal "\t" is correct.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Fri Dec 9 18:06:54 2022
    Weatherby,Gerard 在 2022年12月9日 星期五晚上9:36:18 [UTC+8] 的信中寫道:
    That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).

    Here’s a sample program. The dashes are to help show the boundaries of the string

    #!/usr/bin/env python3
    import argparse
    import logging


    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('data')
    args = parser.parse_args()
    print(f'Input\n: -{args.data}- length {len(args.data)}')
    for c in args.data:
    print(f'{ord(c)} ',end='')
    print()


    Using bash on Linux:

    ./cl.py '^M
    '
    Input
    -
    - length 3
    13 32 10
    From: Python-list <python-list-bounces+gweatherby=uchc...@python.org> on behalf of Jach Feng <jf...@ms4.hinet.net>
    Date: Thursday, December 8, 2022 at 9:31 PM
    To: pytho...@python.org <pytho...@python.org>
    Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully? *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***
    Jach Feng 在 2022年12月7日 星期三上午10:23:20 [UTC+8] 的信中寫道:
    s0 = r'\x0a'
    At this moment it was done by

    def to1byte(matchobj):
    ....return chr(int('0x' + matchobj.group(1), 16))
    s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

    But, is it that difficult on doing this simple thing?

    --Jach
    The whold story is,

    I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
    But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

    I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

    At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

    def to1byte(matchobj):
    ....return matchobj.group(0).encode().decode("unicode-escape")
    -- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
    python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$>

    That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).

    You are right, that's why I found later that it's easier to enter it using a preferred pattern. But there is a case, as moi mentioned in his previous post, will cause failure when a Windows path in the form of \xdd just happen in the string:-(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Fri Dec 9 18:12:41 2022
    moi 在 2022年12月9日 星期五晚上11:41:20 [UTC+8] 的信中寫道:
    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
    a
    b c
    €uro
    z

    PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

    PS C:\humour> licp($a)
    a U+0061
    b U+0062
    U+0009
    c U+0063
    € U+20AC
    u U+0075
    r U+0072
    o U+006F
    x U+0078
    U+0008
    z U+007A

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
    a
    b c
    €uro
    z
    end😊

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
    a
    b c
    €uro
    z

    PS C:\humour>
    Note: In a terminal "\t" is correct.
    Where is the sysargwithliteral.py?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Sun Dec 11 06:05:14 2022
    Limited powershell experience. I did something wrong, licp().

    PS C:\humour> $a =py38x sysargwithliteral.py '\xc5\x81uckasz\x20pays\x0ain\x20\xe2\x82\xacuro' utf8
    PS C:\humour> $a
    Łuckasz pays
    in €uro
    PS C:\humour> licp2 $a
    Ł U+0141
    u U+0075
    c U+0063
    k U+006B
    a U+0061
    s U+0073
    z U+007A
    U+0020
    p U+0070
    a U+0061
    y U+0079
    s U+0073
    U+000D

    U+000A
    i U+0069
    n U+006E
    U+0020
    € U+20AC
    u U+0075
    r U+0072
    o U+006F
    PS C:\humour> $b =py38x sysargwithliteral.py Łuckasz\x20pays\x0ain\x20€uro iso-8859-2
    PS C:\humour> $b
    Łuckasz pays
    in €uro
    PS C:\humour> licp2 $b
    Ł U+0141
    u U+0075
    c U+0063
    k U+006B
    a U+0061
    s U+0073
    z U+007A
    U+0020
    p U+0070
    a U+0061
    y U+0079
    s U+0073
    U+000D

    U+000A
    i U+0069
    n U+006E
    U+0020
    € U+20AC
    u U+0075
    r U+0072
    o U+006F
    PS C:\humour> $aa = $a | out-string
    PS C:\humour> $bb = $b | out-string
    PS C:\humour> $aa -eq $bb
    True
    PS C:\humour>

    -----

    In

    PS C:\humour> $a = py38 -c "print('a\nb')"

    $a is not a string !

    PS C:\humour> $a.gettype()

    IsPublic IsSerial Name BaseType
    -------- -------- ---- --------
    True True Object[] System.Array

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Mon Dec 12 01:38:38 2022
    ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
    True
    ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
    True
    ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
    False

    ---------


    print(codecs.decode(r'z', 'unicode-escape'))
    z
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    g\u0007z
    print(codecs.decode(r'g\nz', 'unicode-escape'))
    g
    z

    print(codecs.decode(r'abcü', 'unicode-escape'))
    abcü


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jach Feng@21:1/5 to All on Mon Dec 12 03:03:48 2022
    moi 在 2022年12月12日 星期一下午5:38:50 [UTC+8] 的信中寫道:
    ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
    True
    ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
    True
    ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
    False

    ---------


    print(codecs.decode(r'z', 'unicode-escape'))
    z
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    g\u0007z
    print(codecs.decode(r'g\nz', 'unicode-escape'))
    g
    z

    print(codecs.decode(r'abcü', 'unicode-escape'))
    abcü

    I have a different result:-)

    print(codecs.decode(r'g\hz', 'unicode-escape'))
    <stdin>:1: DeprecationWarning: invalid escape sequence '\h'
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    gz # with a companioning bell

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Mon Dec 12 04:26:42 2022
    Le lundi 12 décembre 2022 à 12:04:01 UTC+1, jf...@ms4.hinet.net a écrit :
    moi 在 2022年12月12日 星期一下午5:38:50 [UTC+8] 的信中寫道:
    ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
    True
    ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
    True
    ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
    False

    ---------


    print(codecs.decode(r'z', 'unicode-escape'))
    z
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    g\u0007z
    print(codecs.decode(r'g\nz', 'unicode-escape'))
    g
    z

    print(codecs.decode(r'abcü', 'unicode-escape'))
    abcü

    I have a different result:-)
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    <stdin>:1: DeprecationWarning: invalid escape sequence '\h'
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    gz # with a companioning bell

    Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
    coq runs coqzero.py...
    ...coqzero has been executed
    import unicodedata
    import codecs
    print('a\u0000b\bcd\x1fend')
    a\u0000b\u0008cd\u001fend

    unicodedata.normalize('NFKD', 'aéböc')
    'ae\u0301bo\u0308c'

    print(codecs.decode(r'ö', 'unicode-escape'))
    ö
    codecs.decode(r'ö', 'unicode-escape')
    'ö'



    "official py38" :
    print(codecs.decode(r'ö', 'unicode-escape'))
    ö


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Mon Dec 12 04:29:33 2022
    Le lundi 12 décembre 2022 à 13:26:56 UTC+1, moi a écrit :
    Le lundi 12 décembre 2022 à 12:04:01 UTC+1, jf...@ms4.hinet.net a écrit :
    moi 在 2022年12月12日 星期一下午5:38:50 [UTC+8] 的信中寫道:
    ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
    True
    ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
    True
    ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
    False

    ---------


    print(codecs.decode(r'z', 'unicode-escape'))
    z
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    g\u0007z
    print(codecs.decode(r'g\nz', 'unicode-escape'))
    g
    z

    print(codecs.decode(r'abcü', 'unicode-escape'))
    abcü

    I have a different result:-)
    print(codecs.decode(r'g\hz', 'unicode-escape'))
    <stdin>:1: DeprecationWarning: invalid escape sequence '\h'
    g\hz
    print(codecs.decode(r'g\az', 'unicode-escape'))
    gz # with a companioning bell
    Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
    coq runs coqzero.py...
    ...coqzero has been executed
    import unicodedata
    import codecs
    print('a\u0000b\bcd\x1fend')
    a\u0000b\u0008cd\u001fend

    unicodedata.normalize('NFKD', 'aéböc')
    'ae\u0301bo\u0308c'

    print(codecs.decode(r'ö', 'unicode-escape'))
    ö
    codecs.decode(r'ö', 'unicode-escape')
    'ö'



    "official py38" :
    print(codecs.decode(r'ö', 'unicode-escape'))
    ö


    Missing part in e-mail

    Sorry. I used *my* interactive interpreter. I took the freedom to display "chars" a little bit differently.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From moi@21:1/5 to All on Fri Dec 23 00:27:31 2022
    Le samedi 10 décembre 2022 à 03:12:54 UTC+1, jf...@ms4.hinet.net a écrit :
    moi 在 2022年12月9日 星期五晚上11:41:20 [UTC+8] 的信中寫道:
    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
    a
    b c
    €uro
    z

    PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

    PS C:\humour> licp($a)
    a U+0061
    b U+0062
    U+0009
    c U+0063
    € U+20AC
    u U+0075
    r U+0072
    o U+006F
    x U+0078
    U+0008
    z U+007A

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
    a
    b c
    €uro
    z
    end😊

    PS C:\humour>

    PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
    a
    b c
    €uro
    z

    PS C:\humour>
    Note: In a terminal "\t" is correct.
    Where is the sysargwithliteral.py?

    -------

    Deleted.
    It works. It is however a non sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)