Just wondering if anyone's ever seen this before.
=?utf-8?Q?IMG=5F0506.PNG?=
That was a filename in email attachments. I saved
them and then couldn't rename or delete them!
"Apd" wrote
| > =?utf-8?Q?IMG=5F0506.PNG?=
|
| 5F = hex for underscore.
|
| Without encoding: IMG_0506.png
Ah. Thanks. I didn't think of that. But it makes no sense,
since _ is within ASCII, so it's also proper UTF-8.
and the whole thing still doesn't make sense. UTF-8 is not valid
in the Windows file system as far as I know. The sender said he
used something called "Spark". I figured it was probably
some kinf of Apple shennanigans, but it seems to be a
Windows program.
Found it: https://en.wikipedia.org/wiki/MIME
It's called Q-encoding. I'd never heard of this. Bizarre.
It includes the text encoding designation within the filename
field, and all those = and ? are part of the required format!
So it seems there were two problems. Spark email mistakenly
encoded _ and had to use Q-encoding,
while my TBird seems
to only partially recognize Q-encoding. It dropped everything
except 0506.png, but it must have written a corrupt filename
to disk, perhaps including = and ?, resulting in saving file names
with "illegal" characters.
"Apd" <not@all.invalid> wrote
| > =?utf-8?Q?IMG=5F0506.PNG?=
|
| 5F = hex for underscore.
|
| Without encoding: IMG_0506.png
|
Ah. Thanks. I didn't think of that. But it makes no sense,
since _ is within ASCII, so it's also proper UTF-8. and the
whole thing still doesn't make sense. UTF-8 is not valid
in the Windows file system as far as I know. The sender said he
used something called "Spark". I figured it was probably
some kinf of Apple shennanigans, but it seems to be a
Windows program.
"Apd" wrote
| Something I didn't know is that an underscore represents a space and
| so needs to be encoded. Normally, a space would be "=20" but they say
| it's for readability. There was mention in the RFC about underscores
| not passing through some mail gateways (I don't know how true that is
| nowadays), so perhaps the email program was encoding just to be safe.
I think it's just a bug. An underscore is often
used instead of a space, where a space can't be
used, like a URL. But it's not a space character.
And there's no problem with space characters in
ASCII.
"Apd" <not@all.invalid> wrote
| > | > =?utf-8?Q?IMG=5F0506.PNG?=
| > |
| It's not unusual to see internet messages (which include email) using
| UTF-8 when non-ASCII really is present. I think Windows tries to
| convert it to 1252 (or whatever codepage charset it's using these
| days). If it fails and inserts a '?' for a char it doesn't understand,
| that's going to cause problems with file names. It shouldn't have been
| an issue here.
|
Found it: https://en.wikipedia.org/wiki/MIME
It's called Q-encoding. I'd never heard of this. Bizarre.
It includes the text encoding designation within the filename
field, and all those = and ? are part of the required format!
So it seems there were two problems. Spark email mistakenly
encoded _ and had to use Q-encoding, while my TBird seems
to only partially recognize Q-encoding. It dropped everything
except 0506.png, but it must have written a corrupt filename
to disk, perhaps including = and ?, resulting in saving file names
with "illegal" characters. (I've noticed that's often feasible. For
example I can create a .htaccess file with VBScript but Explorer
won't let me start a file name with a period.)
So that might explain why I couldn't rename or delete the files.
They were recorded with corrupt file names. I was able to delete
them with File Assassin, but that program had failed to let me
rename them. Which also makes sense, I guess, because the
files were never locked -- only corrupted as file system entries.
"Frank Slootweg" <this@ddress.is.invalid> wrote[...]
| As to why the file name got Q-encoded in the first place:
|
| I suspect that at some time, the file name was used in some e-mail
| header - probably in Subject; - and for some reason (see Apd's
| responses), some mailer somewhere thought the file name should be
| Q-encoded (perhaps indeed because of the underscore). Once encoded,
| nobody should see the encoded form, only the decoded form. BUT if somone
| would copy and paste the file name from the header, the clipboard would
| probably contain the encoded name, resulting in the havoc you
| experienced.
There was no excuse for the encoding, except that Spark
was composing in UTF-8 and it's "legal" to encode it. The
mystery is why TBird, or something, dropped out the underscore.
An underscore is a perfectly legit ASCII character. But somehow
Explorer ended up not showing it. That made me curious whether
there's a way to directly read the file system, but I'm not
aware of such a tool. I'm curious how the file was recorded.
Since File Assassin could delete it but enable me to rename or
delete, I'm guessing the name was corrupted between the file
system and Explorer. Maybe it was recorded as including an = sign,
for example.
I just sent myself an image with underscore and a space. It
came through normally:
Content-Type: image/jpeg;
name="_e-device spying.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="_e-device spying.jpg"
I'm curious how common this Q-encoding is. I've never
seen it before. I'd never heard of it. It's clearly "legal",
but I've even written email software and never saw such
a thing. There's no possible reason for it except to transmit
characters that don't exist in ASCII. Even then, it would
be converted on most systems. That is, if you send me
something like a file with a Chinese character then I'd
probably receive something like ~1/4.jpg if it worked at
all.
Which raises the question of unicode on Windows. Windows
has been unicode-16 for many years, but that's different
from UTF-8, using 2 bytes for all characters. I'm not sure
Explorer is capable, or Windows itself capable, of handling
a UTF-8 file name if there are characters not allowed in
Explorer.
It's funny how quickly character encoding gets confusing.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 297 |
Nodes: | 16 (0 / 16) |
Uptime: | 129:48:38 |
Calls: | 6,663 |
Calls today: | 1 |
Files: | 12,212 |
Messages: | 5,335,387 |