Forum: >>> Magnum BBS <<<

How to convert a raw string r'\xdd' to '\xdd' more gracefully?

From Jach Feng@21:1/5 to All on Tue Dec 6 18:23:07 2022

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MRAB@21:1/5 to Jach Feng on Wed Dec 7 03:01:12 2022

On 2022-12-07 02:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

You could try this:

s0 = r'\x0a'
ast.literal_eval('"%s"' % s0)

'\n'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Tue Dec 6 19:37:54 2022

MRAB 在 2022年12月7日星期三上午11:04:43 [UTC+8] 的信中寫道：

On 2022-12-07 02:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

You could try this:

s0 = r'\x0a'
ast.literal_eval('"%s"' % s0)

'\n'

Not work in my system:-(

Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
\x0a
^
SyntaxError: unexpected character after line continuation character

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Jach Feng on Tue Dec 6 22:55:52 2022

On 12/6/2022 9:23 PM, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

I'm not totally clear on what you are trying to do here. But:

s1 = r'\xdd' # s1[2:] = 'dd'
n1 = int(s1[2:], 16) # = 221 decimal or 0xdd in hex
# So
chr(n1) == 'Ý' # True
# and
'\xdd' == 'Ý' # True

So the conversion you want seems to be chr(int(s1[2:], 16)).

Of course, this will only work if the input string is exactly four
characters long, and the first two characters are r'\x', and the
remaining two characters are going to be a hex string representation of
a number small enough to fit into a byte.

If you know for sure that will be the case, then the conversion above
seems to be about as simple as it could be. If those conditions may not
always be met, then you need to work out exactly what strings you may
need to convert, and what they should be converted to.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Tue Dec 6 23:40:12 2022

Thomas Passin 在 2022年12月7日星期三中午12:51:32 [UTC+8] 的信中寫道：

On 12/6/2022 9:23 PM, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

I'm not totally clear on what you are trying to do here. But:

s1 = r'\xdd' # s1[2:] = 'dd'
n1 = int(s1[2:], 16) # = 221 decimal or 0xdd in hex
# So
chr(n1) == 'Ý' # True
# and
'\xdd' == 'Ý' # True

So the conversion you want seems to be chr(int(s1[2:], 16)).

Of course, this will only work if the input string is exactly four characters long, and the first two characters are r'\x', and the
remaining two characters are going to be a hex string representation of
a number small enough to fit into a byte.

If you know for sure that will be the case, then the conversion above
seems to be about as simple as it could be. If those conditions may not always be met, then you need to work out exactly what strings you may
need to convert, and what they should be converted to.

Thank you for reminding that the '0x'+ in the to1byte() definition is redundant:-)

Just not sure if there is a better way than using chr(int(...)) to do it.
Yes, for this specific case, slice is much simpler than re.sub().

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roel Schroeven@21:1/5 to All on Wed Dec 7 09:42:18 2022

Op 7/12/2022 om 4:37 schreef Jach Feng:

MRAB 在 2022年12月7日星期三上午11:04:43 [UTC+8] 的信中寫道：

On 2022-12-07 02:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

You could try this:

s0 = r'\x0a'
ast.literal_eval('"%s"' % s0)

'\n'

Not work in my system:-(

Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
\x0a
^
SyntaxError: unexpected character after line continuation character

You missed a pair of quotes. They are easily overlooked but very
important. The point is to wrap your string in another pair of quotes so
it becomes a valid Python string literal in a Python string which can
then be passed to ast.literal_eval(). Works for me:

In [7]: s0 = r'\x0a'

In [8]: import ast

In [9]: ast.literal_eval('"%s"' % s0)
Out[9]: '\n'

--
"Experience is that marvelous thing that enables you to recognize a
mistake when you make it again."
-- Franklin P. Jones

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Jach Feng on Wed Dec 7 10:49:04 2022

Jach Feng <jfong@ms4.hinet.net> writes:

s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?

bytes.fromhex( s0[ 2: ])

This has not the same type as the result of "chr",
but maybe "bytes" is even more appropriate for a byte.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Otten@21:1/5 to Jach Feng on Wed Dec 7 22:12:27 2022

On 07/12/2022 03:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Wed Dec 7 17:16:09 2022

Roel Schroeven 在 2022年12月7日星期三下午4:42:48 [UTC+8] 的信中寫道：

Op 7/12/2022 om 4:37 schreef Jach Feng:

MRAB 在 2022年12月7日星期三上午11:04:43 [UTC+8] 的信中寫道：

On 2022-12-07 02:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

You could try this:

s0 = r'\x0a'
ast.literal_eval('"%s"' % s0)

'\n'

Not work in my system:-(

Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
\x0a
^
SyntaxError: unexpected character after line continuation character

You missed a pair of quotes. They are easily overlooked but very
important. The point is to wrap your string in another pair of quotes so
it becomes a valid Python string literal in a Python string which can
then be passed to ast.literal_eval(). Works for me:

In [7]: s0 = r'\x0a'

In [8]: import ast

In [9]: ast.literal_eval('"%s"' % s0)
Out[9]: '\n'

--
"Experience is that marvelous thing that enables you to recognize a
mistake when you make it again."
-- Franklin P. Jones

Thank you for notifying me. I did notice those ''' in MRAB's post, but didn't figure out what it is at that time:-(

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Wed Dec 7 17:17:22 2022

Peter Otten 在 2022年12月8日星期四清晨5:17:59 [UTC+8] 的信中寫道：

On 07/12/2022 03:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Thu Dec 8 00:56:42 2022

Jach Feng 在 2022年12月7日星期三上午10:23:20 [UTC+8] 的信中寫道：

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

I find another answer on the web.

s0 = r'\x0a'
s0.encode('Latin-1').decode('unicode-escape')

'\n'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Thu Dec 8 13:23:42 2022

I$B!G(Bm not understanding the task. The sample code given is converting the input r$B!G(B\x0a$B!G(B to a newline, it appears.

import re

def exam(z):
print(f"examine {type(z)} {z}")
for c in z:
print(f"{ord(c)} {c}")

s0 = r'\x0a'

def to1byte(matchobj):
return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
exam(s0)
exam(s1)

---
examine <class 'str'> \x0a
92 \
120 x
48 0
97 a
examine <class 'str'>

10

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
Date: Wednesday, December 7, 2022 at 9:27 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Peter Otten $B:_(B 2022$BG/(B12$B7n(B8$BF|(B $B@14|;M@6Zo(B5:17:59 [UTC+8] $BE*?.CfUmF;!'(B

On 07/12/2022 03:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

Thank you. What I real

From Thomas Passin@21:1/5 to Gerard on Thu Dec 8 09:12:19 2022

The original post started out with r'\x0a' but then talked about '\xdd'.
I assumed that there was a pattern here, a raw string containing "\x"
and two more characters, and made a suggestion for converting any string
with that pattern. But the OP was very unclear what the task really
was, so here we all are, making a variety of guesses.

On 12/8/2022 8:23 AM, Weatherby,Gerard wrote:

I’m not understanding the task. The sample code given is converting the input r’\x0a’ to a newline, it appears.

import re

def exam(z):
print(f"examine {type(z)} {z}")
for c in z:
print(f"{ord(c)} {c}")

s0 = r'\x0a'

def to1byte(matchobj):
return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
exam(s0)
exam(s1)

---
examine <class 'str'> \x0a
92 \
120 x
48 0
97 a
examine <class 'str'>

10

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
Date: Wednesday, December 7, 2022 at 9:27 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Peter Otten 在 2022年12月8日星期四清晨5:17:59 [UTC+8] 的信中寫道：

On 07/12/2022 03:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/

python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Otten@21:1/5 to Jach Feng on Thu Dec 8 17:20:27 2022

On 08/12/2022 02:17, Jach Feng wrote:

Peter Otten 在 2022年12月8日星期四清晨5:17:59 [UTC+8] 的信中寫道：

On 07/12/2022 03:23, Jach Feng wrote:

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)

Hm, codecs.decode() does work for arbitrary escapes. It will produce the
same result for r"\xdd"-type raw strings where d is in the range 0...F.
It will also convert other escapes like

codecs.decode(r"\t", "unicode-escape")

'\t'

codecs.decode(r"\u5728", "unicode-escape")

'在'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Thu Dec 8 11:12:30 2022

PS C:\humour> py38 sysargwithliteral.py abc\x80œ cp1252
abc€œ
PS C:\humour> py38 sysargwithliteral.py abc\xe1œ€ cp1253
abcαœ€
PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp1253
abcαβγz
PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp437
abcßΓπz
PS C:\humour> py38 sysargwithliteral.py abc\xe1\xe2\xe3z cp850
abcßÔÒz
PS C:\humour> py38 sysargwithliteral.py abc\u03b1\u03b2\u03b3z unicode abcαβγz
PS C:\humour> py38 sysargwithliteral.py abc\u03b1\u03b2\u03b3z unicode abcαβγz
PS C:\humour> py38 sysargwithliteral.py abc\\ cp1252
abc\

Anyway. Interpreting a command line may lead to a non sense.
Ditto for piping.

PS C:\humour> py38 sysargwithliteral.py x:\xffb.html cp1252
x:ÿb.html
PS C:\humour> py38 sysargwithliteral.py x:\\xffb.html cp1252
x:\xffb.html
PS C:\humour>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Thu Dec 8 13:08:39 2022

Le mercredi 7 décembre 2022 à 22:17:59 UTC+1, Peter Otten a écrit :

import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")

'hello\n'

Rejected.

It works by chance correctly only because you are using ascii.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Thu Dec 8 18:05:02 2022

Jach Feng 在 2022年12月7日星期三上午10:23:20 [UTC+8] 的信中寫道：

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

The whold story is,

I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

def to1byte(matchobj):
....return matchobj.group(0).encode().decode("unicode-escape")

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Fri Dec 9 13:29:06 2022

That$B!G(Bs actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you$B!G(Bre using. (e.g. https://www.alt-codes.net).

Here$B!G(Bs a sample program. The dashes are to help show the boundaries of the string

#!/usr/bin/env python3
import argparse
import logging

parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('data')
args = parser.parse_args()
print(f'Input\n: -{args.data}- length {len(args.data)}')
for c in args.data:
print(f'{ord(c)} ',end='')
print()

Using bash on Linux:

./cl.py '^M
'
Input
-
- length 3
13 32 10

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
Date: Thursday, December 8, 2022 at 9:31 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Jach Feng $B:_(B 2022$BG/(B12$B7n(B7$BF|(B $B@14|;0>e8a(B10:23:20 [UTC+8] $BE*?.CfUmF;!'(B

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

The whold story is,

I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

def to1byte(matchobj):
....return matchobj.group(0).encode().decode("unicode-escape")
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Fri Dec 9 07:41:06 2022

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 a
b c
€uro
z

PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

PS C:\humour> licp($a)
a U+0061
b U+0062
U+0009
c U+0063
€ U+20AC
u U+0075
r U+0072
o U+006F
x U+0078
U+0008
z U+007A

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
a
b c
€uro
z
end😊

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
a
b c
€uro
z

PS C:\humour>
Note: In a terminal "\t" is correct.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Fri Dec 9 18:06:54 2022

Weatherby,Gerard 在 2022年12月9日星期五晚上9:36:18 [UTC+8] 的信中寫道：

That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).

Here’s a sample program. The dashes are to help show the boundaries of the string

#!/usr/bin/env python3
import argparse
import logging

parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('data')
args = parser.parse_args()
print(f'Input\n: -{args.data}- length {len(args.data)}')
for c in args.data:
print(f'{ord(c)} ',end='')
print()

Using bash on Linux:

./cl.py '^M
'
Input
-
- length 3
13 32 10
From: Python-list <python-list-bounces+gweatherby=uchc...@python.org> on behalf of Jach Feng <jf...@ms4.hinet.net>
Date: Thursday, December 8, 2022 at 9:31 PM
To: pytho...@python.org <pytho...@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully? *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***
Jach Feng 在 2022年12月7日星期三上午10:23:20 [UTC+8] 的信中寫道：

s0 = r'\x0a'
At this moment it was done by

def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)

But, is it that difficult on doing this simple thing?

--Jach

The whold story is,

I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.

But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.

I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.

At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)

def to1byte(matchobj):
....return matchobj.group(0).encode().decode("unicode-escape")
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/

python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$>

That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).

You are right, that's why I found later that it's easier to enter it using a preferred pattern. But there is a case, as moi mentioned in his previous post, will cause failure when a Windows path in the form of \xdd just happen in the string:-(

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Fri Dec 9 18:12:41 2022

moi 在 2022年12月9日星期五晚上11:41:20 [UTC+8] 的信中寫道：

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
a
b c
€uro
z

PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

PS C:\humour> licp($a)
a U+0061
b U+0062
U+0009
c U+0063
€ U+20AC
u U+0075
r U+0072
o U+006F
x U+0078
U+0008
z U+007A

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
a
b c
€uro
z
end😊

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
a
b c
€uro
z

PS C:\humour>
Note: In a terminal "\t" is correct.

Where is the sysargwithliteral.py?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Sun Dec 11 06:05:14 2022

Limited powershell experience. I did something wrong, licp().

PS C:\humour> $a =py38x sysargwithliteral.py '\xc5\x81uckasz\x20pays\x0ain\x20\xe2\x82\xacuro' utf8
PS C:\humour> $a
Łuckasz pays
in €uro
PS C:\humour> licp2 $a
Ł U+0141
u U+0075
c U+0063
k U+006B
a U+0061
s U+0073
z U+007A
U+0020
p U+0070
a U+0061
y U+0079
s U+0073
U+000D

U+000A
i U+0069
n U+006E
U+0020
€ U+20AC
u U+0075
r U+0072
o U+006F
PS C:\humour> $b =py38x sysargwithliteral.py Łuckasz\x20pays\x0ain\x20€uro iso-8859-2
PS C:\humour> $b
Łuckasz pays
in €uro
PS C:\humour> licp2 $b
Ł U+0141
u U+0075
c U+0063
k U+006B
a U+0061
s U+0073
z U+007A
U+0020
p U+0070
a U+0061
y U+0079
s U+0073
U+000D

U+000A
i U+0069
n U+006E
U+0020
€ U+20AC
u U+0075
r U+0072
o U+006F
PS C:\humour> $aa = $a | out-string
PS C:\humour> $bb = $b | out-string
PS C:\humour> $aa -eq $bb
True
PS C:\humour>

-----

In

PS C:\humour> $a = py38 -c "print('a\nb')"

$a is not a string !

PS C:\humour> $a.gettype()

IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Mon Dec 12 01:38:38 2022

ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")

True

ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")

True

ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")

False

---------

print(codecs.decode(r'z', 'unicode-escape'))

z

print(codecs.decode(r'g\hz', 'unicode-escape'))

g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

g\u0007z

print(codecs.decode(r'g\nz', 'unicode-escape'))

g
z

print(codecs.decode(r'abcü', 'unicode-escape'))
abcÃ¼

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jach Feng@21:1/5 to All on Mon Dec 12 03:03:48 2022

moi 在 2022年12月12日星期一下午5:38:50 [UTC+8] 的信中寫道：

ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")

True

ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")

True

ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")

False

---------

print(codecs.decode(r'z', 'unicode-escape'))

z

print(codecs.decode(r'g\hz', 'unicode-escape'))

g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

g\u0007z

print(codecs.decode(r'g\nz', 'unicode-escape'))

g
z

print(codecs.decode(r'abcü', 'unicode-escape'))
abcÃ¼

I have a different result:-)

print(codecs.decode(r'g\hz', 'unicode-escape'))

<stdin>:1: DeprecationWarning: invalid escape sequence '\h'
g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

gz # with a companioning bell

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Mon Dec 12 04:26:42 2022

Le lundi 12 décembre 2022 à 12:04:01 UTC+1, jf...@ms4.hinet.net a écrit :

moi 在 2022年12月12日星期一下午5:38:50 [UTC+8] 的信中寫道：

ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")

True

ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")

True

ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")

False

---------

print(codecs.decode(r'z', 'unicode-escape'))

z

print(codecs.decode(r'g\hz', 'unicode-escape'))

g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

g\u0007z

print(codecs.decode(r'g\nz', 'unicode-escape'))

g
z

print(codecs.decode(r'abcü', 'unicode-escape'))
abcÃ¼

I have a different result:-)

print(codecs.decode(r'g\hz', 'unicode-escape'))

<stdin>:1: DeprecationWarning: invalid escape sequence '\h'
g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

gz # with a companioning bell

Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32

coq runs coqzero.py...
...coqzero has been executed

import unicodedata
import codecs
print('a\u0000b\bcd\x1fend')

a\u0000b\u0008cd\u001fend

unicodedata.normalize('NFKD', 'aéböc')

'ae\u0301bo\u0308c'

print(codecs.decode(r'ö', 'unicode-escape'))

Ã¶

codecs.decode(r'ö', 'unicode-escape')

'Ã¶'

"official py38" :

print(codecs.decode(r'ö', 'unicode-escape'))

Ã¶

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Mon Dec 12 04:29:33 2022

Le lundi 12 décembre 2022 à 13:26:56 UTC+1, moi a écrit :

Le lundi 12 décembre 2022 à 12:04:01 UTC+1, jf...@ms4.hinet.net a écrit :

moi 在 2022年12月12日星期一下午5:38:50 [UTC+8] 的信中寫道：

ast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")

True

ast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")

True

ast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")

False

---------

print(codecs.decode(r'z', 'unicode-escape'))

z

print(codecs.decode(r'g\hz', 'unicode-escape'))

g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

g\u0007z

print(codecs.decode(r'g\nz', 'unicode-escape'))

g
z

print(codecs.decode(r'abcü', 'unicode-escape'))
abcÃ¼

I have a different result:-)

print(codecs.decode(r'g\hz', 'unicode-escape'))

<stdin>:1: DeprecationWarning: invalid escape sequence '\h'
g\hz

print(codecs.decode(r'g\az', 'unicode-escape'))

gz # with a companioning bell

Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32

coq runs coqzero.py...
...coqzero has been executed

import unicodedata
import codecs
print('a\u0000b\bcd\x1fend')

a\u0000b\u0008cd\u001fend

unicodedata.normalize('NFKD', 'aéböc')

'ae\u0301bo\u0308c'

print(codecs.decode(r'ö', 'unicode-escape'))

Ã¶

codecs.decode(r'ö', 'unicode-escape')

'Ã¶'

"official py38" :

print(codecs.decode(r'ö', 'unicode-escape'))

Ã¶

Missing part in e-mail

Sorry. I used *my* interactive interpreter. I took the freedom to display "chars" a little bit differently.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Fri Dec 23 00:27:31 2022

Le samedi 10 décembre 2022 à 03:12:54 UTC+1, jf...@ms4.hinet.net a écrit :

moi 在 2022年12月9日星期五晚上11:41:20 [UTC+8] 的信中寫道：

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
a
b c
€uro
z

PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252

PS C:\humour> licp($a)
a U+0061
b U+0062
U+0009
c U+0063
€ U+20AC
u U+0075
r U+0072
o U+006F
x U+0078
U+0008
z U+007A

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
a
b c
€uro
z
end😊

PS C:\humour>

PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
a
b c
€uro
z

PS C:\humour>
Note: In a terminal "\t" is correct.

Where is the sysargwithliteral.py?

-------

Deleted.
It works. It is however a non sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	59:19:28
Calls:	6,712
Files:	12,243
Messages:	5,355,692

How to convert a raw string r'\xdd' to '\xdd' more gracefully?

Who's Online

System Info