... print(f.read())with open('ms52.txt', 'r', encoding='cp1252') as f:
The powershell variants, win11 terminal, pwsh.exe (7.2.7), or *x powershell, are not reading a file correctly.
In order to read a file, powershell set the priority on the BOM detection over
an explicitly declared encoding. This can not work.
Illustration and stupid cases.
ms52.txt a valid Windows-1252 encoded file.
PS C:\humour> py38
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:34:34) [MSC v.1928 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
... print(f.read())with open('ms52.txt', 'r', encoding='cp1252') as f:
...
abc霟
# cmd.exe : correct
c:\humour>type ms52.txt
abc霟
c:\humour>
PS C:\humour> get-content ms52.txt -encoding default
abc霟
PS C:\humour> # expected if utf-8 were the real default, utf-8-bom
PS C:\humour> get-content ms52.txt
abc霟
PS C:\humour> get-content ms52.txt -encoding default
abc霟
# win-1252 -> utf-8 conversion : impossible
PS C:\humour> py38 contenu.py conversion.txt
bytes 46 b'\xef\xbb\xbf are the three characters you may see...\r\n'
1252 46  are the three characters you may see...\r\n
UTF-8 BOM 43 are the three characters you may see...\r\n
PS C:\humour>
PS C:\humour> get-content conversion.txt -encoding default | set-content zz.txt -encoding utf8
PS C:\humour> py38 contenu.py zz.txt
bytes 46 b'\xef\xbb\xbf are the three characters you may see...\r\n'
1252 46  are the three characters you may see...\r\n
UTF-8 BOM 43 are the three characters you may see...\r\n
PS C:\humour>
# Probably, the top of the absurdity, One can not save and read a file with # the same codec !
PS C:\humour> $ll
abc霟
PS C:\humour> $ll | set-content zz.txt -encoding default
PS C:\humour> py38 contenu.py zz.txt
bytes 11 b'\xef\xbb\xbfabc\xe9\x9c\x9f\r\n'
1252 11 abc霟\r\n
UTF-8 BOM 6 abc霟\r\n
PS C:\humour> $in = get-content zz.txt -encoding default
PS C:\humour> $in
abc霟
PS C:\humour> $in -eq $ll
False
PS C:\humour>
Ditto with pwsh.exe and a explicit 1252 or windows-1252 encoding names
PS C:\humour> $a = get-content zz.txt -encoding 1252
PS C:\humour> $a
abc霟
PS C:\humour>
Amusing in win11 where the default codec is windows-1252!
PS C:\humour> "abcéà€" | set-content a.txt -encoding default
PS C:\humour> py38 contenu.jpy a.txt
c:\Python38\python.exe: can't open file 'contenu.jpy': [Errno 2] No such file or directory
PS C:\humour> py38 contenu.py a.txt
bytes 8 b'abc\xe9\xe0\x80\r\n'
1252 8 abcéà€\r\n
UTF-8 NO BOM 8 abc���\r\n
PS C:\humour>
Miscellaneous
PS C:\humour> $psversiontable
Name Value
---- -----
PSVersion 7.2.7
PSEdition Core
GitCommitId 7.2.7
OS Microsoft Windows 10.0.22621
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
and
Name Value
---- -----
PSVersion 5.1.22621.608
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.22621.608
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
PS C:\humour> get-content iso5.txt -encoding iso-8859-5
abc
PS C:\humour> # wrong
PS C:\humour> get-content passwordiso2.txt -encoding iso-8859-2
éz
PS C:\humour> # wrong
An é in a real iso-8859-2 ?
PS C:\humour> py38 -c "print('é'.encode('iso-8859-2'))"
b'\xe9'
PS C:\humour>
23.11.2022. Updated win11 22H2 version.
Dear devs, you have 24/48 hours to fix this buggy behaviour.
Regards.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 71:30:24 |
Calls: | 6,712 |
Files: | 12,244 |
Messages: | 5,356,970 |