s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
'\n's0 = r'\x0a'
ast.literal_eval('"%s"' % s0)
On 2022-12-07 02:23, Jach Feng wrote:Not work in my system:-(
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
You could try this:
'\n's0 = r'\x0a'
ast.literal_eval('"%s"' % s0)
Traceback (most recent call last):s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--Jach
On 12/6/2022 9:23 PM, Jach Feng wrote:Thank you for reminding that the '0x'+ in the to1byte() definition is redundant:-)
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--JachI'm not totally clear on what you are trying to do here. But:
s1 = r'\xdd' # s1[2:] = 'dd'
n1 = int(s1[2:], 16) # = 221 decimal or 0xdd in hex
# So
chr(n1) == 'Ý' # True
# and
'\xdd' == 'Ý' # True
So the conversion you want seems to be chr(int(s1[2:], 16)).
Of course, this will only work if the input string is exactly four characters long, and the first two characters are r'\x', and the
remaining two characters are going to be a hex string representation of
a number small enough to fit into a byte.
If you know for sure that will be the case, then the conversion above
seems to be about as simple as it could be. If those conditions may not always be met, then you need to work out exactly what strings you may
need to convert, and what they should be converted to.
MRAB 在 2022年12月7日 星期三上午11:04:43 [UTC+8] 的信中寫道:You missed a pair of quotes. They are easily overlooked but very
On 2022-12-07 02:23, Jach Feng wrote:
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
You could try this:
Not work in my system:-('\n's0 = r'\x0a'
ast.literal_eval('"%s"' % s0)
Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)
File "<stdin>", line 1, in <module>
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
\x0a
^
SyntaxError: unexpected character after line continuation character
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
s0 = r'\x0a''hello\n'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
Op 7/12/2022 om 4:37 schreef Jach Feng:Thank you for notifying me. I did notice those ''' in MRAB's post, but didn't figure out what it is at that time:-(
MRAB 在 2022年12月7日 星期三上午11:04:43 [UTC+8] 的信中寫道:
On 2022-12-07 02:23, Jach Feng wrote:
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
You could try this:
Not work in my system:-('\n's0 = r'\x0a'
ast.literal_eval('"%s"' % s0)
Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:08:11) [MSC v.1928 32 bit (Intel)] on win32You missed a pair of quotes. They are easily overlooked but very
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):s0 = r'\x0a'
import ast
ast.literal_eval("%s" % s0)
File "<stdin>", line 1, in <module>
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "C:\Users\Jach\AppData\Local\Programs\Python\Python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
\x0a
^
SyntaxError: unexpected character after line continuation character
important. The point is to wrap your string in another pair of quotes so
it becomes a valid Python string literal in a Python string which can
then be passed to ast.literal_eval(). Works for me:
In [7]: s0 = r'\x0a'
In [8]: import ast
In [9]: ast.literal_eval('"%s"' % s0)
Out[9]: '\n'
--
"Experience is that marvelous thing that enables you to recognize a
mistake when you make it again."
-- Franklin P. Jones
On 07/12/2022 03:23, Jach Feng wrote:Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?'hello\n'
import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
s0 = r'\x0a'I find another answer on the web.
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--Jach
'\n's0 = r'\x0a'
s0.encode('Latin-1').decode('unicode-escape')
On 07/12/2022 03:23, Jach Feng wrote:Thank you. What I real
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?'hello\n'
import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
I’m not understanding the task. The sample code given is converting the input r’\x0a’ to a newline, it appears.
import re
def exam(z):python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$>
print(f"examine {type(z)} {z}")
for c in z:
print(f"{ord(c)} {c}")
s0 = r'\x0a'
def to1byte(matchobj):
return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
exam(s0)
exam(s1)
---
examine <class 'str'> \x0a
92 \
120 x
48 0
97 a
examine <class 'str'>
10
From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Jach Feng <jfong@ms4.hinet.net>
Date: Wednesday, December 7, 2022 at 9:27 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully?
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***
Peter Otten 在 2022年12月8日 星期四清晨5:17:59 [UTC+8] 的信中寫道:
On 07/12/2022 03:23, Jach Feng wrote:Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)
s0 = r'\x0a''hello\n'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kUnextA7_cF7EoP_4hGzC5Jq2wRvn8nwLwT8wmeNkgVjK_n6VG19fxb-4SwmDMwepWe8_bGaH9Y2LlkSvFRz$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
Peter Otten 在 2022年12月8日 星期四清晨5:17:59 [UTC+8] 的信中寫道:
On 07/12/2022 03:23, Jach Feng wrote:Thank you. What I really want to handle is to any r'\xdd'. The r'\x0a' is for example. Sorry, didn't describe it clearly:-)
s0 = r'\x0a''hello\n'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
'\t'codecs.decode(r"\t", "unicode-escape")
'在'codecs.decode(r"\u5728", "unicode-escape")
'hello\n'import codecs
codecs.decode(r"\x68\x65\x6c\x6c\x6f\x0a", "unicode-escape")
s0 = r'\x0a'The whold story is,
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--Jach
s0 = r'\x0a'The whold story is,
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--Jach
That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).
Here’s a sample program. The dashes are to help show the boundaries of the string
#!/usr/bin/env python3
import argparse
import logging
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('data')
args = parser.parse_args()
print(f'Input\n: -{args.data}- length {len(args.data)}')
for c in args.data:
print(f'{ord(c)} ',end='')
print()
Using bash on Linux:But there is no response. I assume that there is no way of doing it and I have to convert it later after I get the whole string from the command line.
./cl.py '^M
'
Input
-
- length 3
13 32 10
From: Python-list <python-list-bounces+gweatherby=uchc...@python.org> on behalf of Jach Feng <jf...@ms4.hinet.net>
Date: Thursday, December 8, 2022 at 9:31 PM
To: pytho...@python.org <pytho...@python.org>
Subject: Re: How to convert a raw string r'xdd' to 'xdd' more gracefully? *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***
Jach Feng 在 2022年12月7日 星期三上午10:23:20 [UTC+8] 的信中寫道:
s0 = r'\x0a'
At this moment it was done by
def to1byte(matchobj):
....return chr(int('0x' + matchobj.group(1), 16))
s1 = re.sub(r'\\x([0-9a-fA-F]{2})', to1byte, s0)
But, is it that difficult on doing this simple thing?
--JachThe whold story is,
I had a script which accepts an argparse's positional argument. I like this argument may have control character embedded in when required. So I make a post "How to enter escape character in a positional string argument from the command line? on DEC05.
I made this convertion using the chr(int(...)) method but not satisfied with. That why this post came out.python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$>
At this moment the conversion is done almost the same as Peter's codecs.decode() method but without the need of importing codecs module:-)
def to1byte(matchobj):
....return matchobj.group(0).encode().decode("unicode-escape")
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!hcg9ULzmtVUzMJ87Emlfsf6PGAfC-MEzUs3QQNVzWwK4aWDEtePG34hRX0ZFVvWcqZXRcM67JkkIg-l-K9vB$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
That’s actually more of a shell question than a Python question. How you pass certain control characters is going to depend on the shell, operating system, and possibly the keyboard you’re using. (e.g. https://www.alt-codes.net).
PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252Where is the sysargwithliteral.py?
a
b c
€uro
z
PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
PS C:\humour> licp($a)
a U+0061
b U+0062
U+0009
c U+0063
€ U+20AC
u U+0075
r U+0072
o U+006F
x U+0078
U+0008
z U+007A
PS C:\humour>
PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
a
b c
€uro
z
end😊
PS C:\humour>
PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
a
b c
€uro
z
PS C:\humour>
Note: In a terminal "\t" is correct.
Trueast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
Trueast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
Falseast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
zprint(codecs.decode(r'z', 'unicode-escape'))
g\hzprint(codecs.decode(r'g\hz', 'unicode-escape'))
g\u0007zprint(codecs.decode(r'g\az', 'unicode-escape'))
gprint(codecs.decode(r'g\nz', 'unicode-escape'))
print(codecs.decode(r'abcü', 'unicode-escape'))
Trueast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
Trueast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
Falseast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
---------
I have a different result:-)zprint(codecs.decode(r'z', 'unicode-escape'))
g\hzprint(codecs.decode(r'g\hz', 'unicode-escape'))
g\u0007zprint(codecs.decode(r'g\az', 'unicode-escape'))
gprint(codecs.decode(r'g\nz', 'unicode-escape'))
z
print(codecs.decode(r'abcü', 'unicode-escape'))
abcü
<stdin>:1: DeprecationWarning: invalid escape sequence '\h'print(codecs.decode(r'g\hz', 'unicode-escape'))
gz # with a companioning bellprint(codecs.decode(r'g\az', 'unicode-escape'))
moi 在 2022年12月12日 星期一下午5:38:50 [UTC+8] 的信中寫道:
Trueast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
Trueast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
Falseast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
---------
coq runs coqzero.py...I have a different result:-)zprint(codecs.decode(r'z', 'unicode-escape'))
g\hzprint(codecs.decode(r'g\hz', 'unicode-escape'))
g\u0007zprint(codecs.decode(r'g\az', 'unicode-escape'))
gprint(codecs.decode(r'g\nz', 'unicode-escape'))
z
print(codecs.decode(r'abcü', 'unicode-escape'))
abcü
<stdin>:1: DeprecationWarning: invalid escape sequence '\h'print(codecs.decode(r'g\hz', 'unicode-escape'))
g\hz
gz # with a companioning bellprint(codecs.decode(r'g\az', 'unicode-escape'))
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
a\u0000b\u0008cd\u001fendimport unicodedata
import codecs
print('a\u0000b\bcd\x1fend')
'ae\u0301bo\u0308c'
unicodedata.normalize('NFKD', 'aéböc')
ö
print(codecs.decode(r'ö', 'unicode-escape'))
'ö'codecs.decode(r'ö', 'unicode-escape')
öprint(codecs.decode(r'ö', 'unicode-escape'))
Le lundi 12 décembre 2022 à 12:04:01 UTC+1, jf...@ms4.hinet.net a écrit :
moi 在 2022年12月12日 星期一下午5:38:50 [UTC+8] 的信中寫道:
Trueast.literal_eval("r'\x7a'") == ast.literal_eval("r'z'")
Trueast.literal_eval("r'\xe0'") == ast.literal_eval("r'à'")
Falseast.literal_eval("r'\x9c'") == ast.literal_eval("r'œ'")
---------
coq runs coqzero.py...I have a different result:-)zprint(codecs.decode(r'z', 'unicode-escape'))
g\hzprint(codecs.decode(r'g\hz', 'unicode-escape'))
g\u0007zprint(codecs.decode(r'g\az', 'unicode-escape'))
gprint(codecs.decode(r'g\nz', 'unicode-escape'))
z
print(codecs.decode(r'abcü', 'unicode-escape'))
abcü
<stdin>:1: DeprecationWarning: invalid escape sequence '\h'print(codecs.decode(r'g\hz', 'unicode-escape'))
g\hz
gz # with a companioning bellprint(codecs.decode(r'g\az', 'unicode-escape'))
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
...coqzero has been executed
a\u0000b\u0008cd\u001fendimport unicodedata
import codecs
print('a\u0000b\bcd\x1fend')
'ae\u0301bo\u0308c'
unicodedata.normalize('NFKD', 'aéböc')
ö
print(codecs.decode(r'ö', 'unicode-escape'))
'ö'codecs.decode(r'ö', 'unicode-escape')
"official py38" :
öprint(codecs.decode(r'ö', 'unicode-escape'))
moi 在 2022年12月9日 星期五晚上11:41:20 [UTC+8] 的信中寫道:
PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
a
b c
€uro
z
PS C:\humour> $a = py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252
PS C:\humour> licp($a)
a U+0061
b U+0062
U+0009
c U+0063
€ U+20AC
u U+0075
r U+0072
o U+006F
x U+0078
U+0008
z U+007A
PS C:\humour>
PS C:\humour> py38 sysargwithliteral.py a\u000ab\u0009c\u000a\u20acuro\u000ax\u0008z\u000aend\U0001f60a unicode
a
b c
€uro
z
end😊
PS C:\humour>
PS C:\humour> py38 sysargwithliteral.py a\x0ab\x09c\x0a\x80uro\x0ax\x08z cp1252 | py38 -c "import sys; s = sys.stdin.read(); print(s.rstrip())"
a
b c
€uro
z
PS C:\humour>Where is the sysargwithliteral.py?
Note: In a terminal "\t" is correct.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 59:19:28 |
Calls: | 6,712 |
Files: | 12,243 |
Messages: | 5,355,692 |