Forum: >>> Magnum BBS <<<

Curious email anomaly

From Newyana2@21:1/5 to All on Fri Sep 8 10:57:41 2023

Just wondering if anyone's ever seen this before.

=?utf-8?Q?IMG=5F0506.PNG?=

That was a filename in email attachments. I saved
them and then couldn't rename or delete them! I've
never run into anything like this. Some kind of unicode
corruption? They saved like so: 5F0506.PNG I
edited the email source code like so: 5F0506.jpg.

The sender was using gmail, no program listed in
the header, and he mistakenly named the files PNG
when they were actually JPG. I'm guessing he was
probably doing gmail through Safari on a Mac, but
I'm not sure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Apd@21:1/5 to All on Fri Sep 8 16:44:30 2023

"Newyana2" wrote:

Just wondering if anyone's ever seen this before.

Yes, particularly fields in Usenet message headers where UTF-8 gets hex-encoded.

=?utf-8?Q?IMG=5F0506.PNG?=

5F = hex for underscore.

Without encoding: IMG_0506.png

That was a filename in email attachments. I saved
them and then couldn't rename or delete them!

Strange.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Apd on Fri Sep 8 19:32:43 2023

"Apd" <not@all.invalid> wrote

| > =?utf-8?Q?IMG=5F0506.PNG?=
|
| 5F = hex for underscore.
|
| Without encoding: IMG_0506.png
|

Ah. Thanks. I didn't think of that. But it makes no sense,
since _ is within ASCII, so it's also proper UTF-8. and the
whole thing still doesn't make sense. UTF-8 is not valid
in the Windows file system as far as I know. The sender said he
used something called "Spark". I figured it was probably
some kinf of Apple shennanigans, but it seems to be a
Windows program.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Apd@21:1/5 to All on Sat Sep 9 09:56:56 2023

"Newyana2" wrote:

"Apd" wrote
| > =?utf-8?Q?IMG=5F0506.PNG?=
|
| 5F = hex for underscore.
|
| Without encoding: IMG_0506.png

Ah. Thanks. I didn't think of that. But it makes no sense,
since _ is within ASCII, so it's also proper UTF-8.

Indeed. Just recently I saw an x-face in a Usenet message header that
partially encoded some ASCII like this. Only a few non-alphabetic
chars and not consistently. Of course, it completely broke it.

and the whole thing still doesn't make sense. UTF-8 is not valid
in the Windows file system as far as I know. The sender said he
used something called "Spark". I figured it was probably
some kinf of Apple shennanigans, but it seems to be a
Windows program.

It's not unusual to see internet messages (which include email) using
UTF-8 when non-ASCII really is present. I think Windows tries to
convert it to 1252 (or whatever codepage charset it's using these
days). If it fails and inserts a '?' for a char it doesn't understand,
that's going to cause problems with file names. It shouldn't have been
an issue here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Apd on Sat Sep 9 08:08:17 2023

"Apd" <not@all.invalid> wrote

| > | > =?utf-8?Q?IMG=5F0506.PNG?=
| > |

| It's not unusual to see internet messages (which include email) using
| UTF-8 when non-ASCII really is present. I think Windows tries to
| convert it to 1252 (or whatever codepage charset it's using these
| days). If it fails and inserts a '?' for a char it doesn't understand,
| that's going to cause problems with file names. It shouldn't have been
| an issue here.
|

Found it: https://en.wikipedia.org/wiki/MIME

It's called Q-encoding. I'd never heard of this. Bizarre.
It includes the text encoding designation within the filename
field, and all those = and ? are part of the required format!

So it seems there were two problems. Spark email mistakenly
encoded _ and had to use Q-encoding, while my TBird seems
to only partially recognize Q-encoding. It dropped everything
except 0506.png, but it must have written a corrupt filename
to disk, perhaps including = and ?, resulting in saving file names
with "illegal" characters. (I've noticed that's often feasible. For
example I can create a .htaccess file with VBScript but Explorer
won't let me start a file name with a period.)

So that might explain why I couldn't rename or delete the files.
They were recorded with corrupt file names. I was able to delete
them with File Assassin, but that program had failed to let me
rename them. Which also makes sense, I guess, because the
files were never locked -- only corrupted as file system entries.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Apd@21:1/5 to All on Sat Sep 9 14:53:48 2023

"Newyana2" wrote:

Found it: https://en.wikipedia.org/wiki/MIME

It's called Q-encoding. I'd never heard of this. Bizarre.
It includes the text encoding designation within the filename
field, and all those = and ? are part of the required format!

Yes, RFC 2047 refers.

So it seems there were two problems. Spark email mistakenly
encoded _ and had to use Q-encoding,

Something I didn't know is that an underscore represents a space and
so needs to be encoded. Normally, a space would be "=20" but they say
it's for readability. There was mention in the RFC about underscores
not passing through some mail gateways (I don't know how true that is nowadays), so perhaps the email program was encoding just to be safe.

while my TBird seems
to only partially recognize Q-encoding. It dropped everything
except 0506.png, but it must have written a corrupt filename
to disk, perhaps including = and ?, resulting in saving file names
with "illegal" characters.

Bad Tbird!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to All on Sat Sep 9 11:58:01 2023

On 9/8/2023 7:32 PM, Newyana2 wrote:

"Apd" <not@all.invalid> wrote

| > =?utf-8?Q?IMG=5F0506.PNG?=
|
| 5F = hex for underscore.
|
| Without encoding: IMG_0506.png
|

Ah. Thanks. I didn't think of that. But it makes no sense,
since _ is within ASCII, so it's also proper UTF-8. and the
whole thing still doesn't make sense. UTF-8 is not valid
in the Windows file system as far as I know. The sender said he
used something called "Spark". I figured it was probably
some kinf of Apple shennanigans, but it seems to be a
Windows program.

There is likely to be more than one program named Spark.

https://spark.apache.org/

*******

I knew NTFS accepted wide characters, but not the details of what
you stuff in them. The reason I have to know about this stuff,
is when analyzing Registry entries, as some file system paths are stored in 16-bit mode
in the Registry. The practice makes it damn hard to search for stuff.

https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as

"NTFS stores filenames in UTF-16, however fopen is using ANSI (not UTF-8).

In order to use an UTF16-encoded file name you will need to use the Unicode versions
of the file open calls. Do this by defining UNICODE and _UNICODE in your project.
Then use the CreateFile call or the wfopen call."

I saw in a ProcMon trace once, the usage of a file opening option,
which seemed to be "open the file but *delete* the file when you close it". Basically a "read and delete" kind of semantic. A properly crafted wfopen command, just might be enough to delete it :-) Deleting in NTFS is not difficult,
and a single byte in the 1024 byte $MFT carries the info that the file is deleted
and that the $MFT entry can be "reused, any time it is convenient for you". There
is no procedure in NTFS, to shrink or consolidate the $MFT, so filenames remain visible until you "create" enough files, to reuse all the unused $MFT entries.

And I don't think changing languages would help. For one filesystem issue,
I was able to use Perl to make a correction. But the handling of anything
other than ANSI, is likely to be just as convoluted as the StackOverflow description.

It seems the NTFS file system calls, just don't have enough sanitization in them.
At a guess. Someone else recently had a problem, where a filename definitely violated
a cardinal NTFS rule, and of course, the user could not rename or delete either,
because as soon as the illegal filename was presented to file explorer, file explorer
said "here, let me fix this for you, by removing the illegal portion", and then of course the result is "file not found". And that's why you're not able to rename or delete, is it *does* do the sanitizing when it is inconvenient to do it.
But *does not* do the sanitizing, for the "browser wedges file system" cases :-/
Some kind of subroutine call browsers are using, seems to be bad for your situation.

*******

You can try deleting the file in question, using the short file name
in a Command Prompt windows. As that name may have fewer representation issues.

del somename.ext

You would need to look up, how to get the short names to show (if they exist). The short name is effectively an alias.

*******

And I'm still chuckling here, as I DID find a way to make illegal filenames
on NTFS :-) (Removing an illegal file, may still have its challenges,
but I cannot reproduce your issue exactly, unless I can find a way to
duplicate it.)

It turns out, that in Linux,

[fuse filesystem ntfs.3g]

sudo mount -o windows_names,rw /dev/sda1 /mnt # This passes a mount option to sanitize
# filenames. This prevents "mistakes".

sudo mount -o rw /dev/sda1 /mnt # This is UNPROTECTED naming.
# Used to make the following picture

You can see I had fun, by putting a "dot" on the end of a filename.

I ran CHKDSK in Windows, and it does not do a damn thing about that file.

[Picture]

https://i.postimg.cc/DZS8LbY4/illegal-filename-via-knoppix531.gif

https://linux.die.net/man/8/ntfs-3g

"windows_names
This option prevents files, directories and extended attributes
to be created with a name not allowed by windows
"

*******

Why did I use Knoppix-531 DVD ?

There was no kernel level NTFS driver back then. Only NTFS-3G
existed, and it was ready to use from the DVD.

You click the disk icon on the desktop. From Terminal (icon on taskbar)

cat /etc/mtab

and that will show the options list for an ordinary mount. The context
menu has an option for "mount read/write" as normally Knoppix 531 safe-mounts disks in read-only mode. In any case, you will notice that Knoppix
does not have "windows_names" in the options list.

At some point, the developers made "windows_names" the default, and,
they did not provide a "windows_names=No" option or similar. This means
the in-kernel NTFS mount on a modern (Ubuntu 23.04), would already be
enforcing valid NTFS filenames.

But, the ntfs-3g still exists, and on Ubuntu, it is already installed

gnome-disks (or just "disks" maybe) # This utility allows discovering names for things

sudo /sbin/mount.ntfs-3g -o rw /dev/sda1 /mnt

cd /mnt
ls
...
rm "funny-named-thing.ext" # This is the challenging part.

cd ~
sudo umount /mnt # Put away partition, before shutdown.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Paul on Sat Sep 9 13:37:10 2023

"Paul" <nospam@needed.invalid> wrote

| There is likely to be more than one program named Spark.
|
| https://spark.apache.org/
|
That's something else. I only find one Spark for
email, made by Readdle. There's no X-Mailer or
UserAgent field in the header. Looking around I see
that identifying the sending program has become rare.

| I knew NTFS accepted wide characters, but not the details of what
| you stuff in them. The reason I have to know about this stuff,
| is when analyzing Registry entries, as some file system paths are stored
in 16-bit mode
| in the Registry. The practice makes it damn hard to search for stuff.
|
| https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
|
| "NTFS stores filenames in UTF-16, however fopen is using ANSI (not
UTF-8).
|

Are you sitting down?... I'm on FAT32. Cuts down on the
nonsense and complications. Permissions are impossible to
enforce.

I didn't know that about NTFS file names. I've never run
into problems. But Windows has been mainly unicode for
a long time. I mostly work with VB6, which converts it
automatically. And Windows Script Host? I can't think of
any software that doesn't transfer seamlessly between
FAT32 and NTFS. I would have thought that Windows
would just manage that.

|
| You can try deleting the file in question, using the short file name
| in a Command Prompt windows. As that name may have fewer representation issues.
|

File Assassin did it. I think the problem was just
that the stored file name in the file system probably
didn't match what Explorer saw.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Apd on Sat Sep 9 13:41:05 2023

"Apd" <not@all.invalid> wrote

| Something I didn't know is that an underscore represents a space and
| so needs to be encoded. Normally, a space would be "=20" but they say
| it's for readability. There was mention in the RFC about underscores
| not passing through some mail gateways (I don't know how true that is
| nowadays), so perhaps the email program was encoding just to be safe.
|

I think it's just a bug. An underscore is often
used instead of a space, where a space can't be
used, like a URL. But it's not a space character.
And there's no problem with space characters in
ASCII. It's not necessary in email for a file name.
The name is in quotes. So filename: "kids at beach.jpg"
would be no problem. There's no reason at all to
be encoding the file name. On the other hand, there's
also no reason that TBird couldn't handle it. So it's
a screw-up on both ends.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Apd@21:1/5 to All on Sat Sep 9 22:56:09 2023

"Newyana2" wrote:

"Apd" wrote
| Something I didn't know is that an underscore represents a space and
| so needs to be encoded. Normally, a space would be "=20" but they say
| it's for readability. There was mention in the RFC about underscores
| not passing through some mail gateways (I don't know how true that is
| nowadays), so perhaps the email program was encoding just to be safe.

I think it's just a bug. An underscore is often
used instead of a space, where a space can't be
used, like a URL. But it's not a space character.
And there's no problem with space characters in
ASCII.

Sure, but I was thinking it started out as an underscore, not a space,
so the mail agent decided to encode it as per RFC comments about
gateways.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Slootweg@21:1/5 to Newyana2@invalid.nospam on Sun Sep 10 15:18:21 2023

Newyana2 <Newyana2@invalid.nospam> wrote:

"Apd" <not@all.invalid> wrote

| > | > =?utf-8?Q?IMG=5F0506.PNG?=
| > |

| It's not unusual to see internet messages (which include email) using
| UTF-8 when non-ASCII really is present. I think Windows tries to
| convert it to 1252 (or whatever codepage charset it's using these
| days). If it fails and inserts a '?' for a char it doesn't understand,
| that's going to cause problems with file names. It shouldn't have been
| an issue here.
|

Found it: https://en.wikipedia.org/wiki/MIME

It's called Q-encoding. I'd never heard of this. Bizarre.
It includes the text encoding designation within the filename
field, and all those = and ? are part of the required format!

So it seems there were two problems. Spark email mistakenly
encoded _ and had to use Q-encoding, while my TBird seems
to only partially recognize Q-encoding. It dropped everything
except 0506.png, but it must have written a corrupt filename
to disk, perhaps including = and ?, resulting in saving file names
with "illegal" characters. (I've noticed that's often feasible. For
example I can create a .htaccess file with VBScript but Explorer
won't let me start a file name with a period.)

So that might explain why I couldn't rename or delete the files.
They were recorded with corrupt file names. I was able to delete
them with File Assassin, but that program had failed to let me
rename them. Which also makes sense, I guess, because the
files were never locked -- only corrupted as file system entries.

You later mentioned that you use FAT32. The 'corrupt file names' issue
is probably related to that, as on my (Windows 11) system, with NTFS,
File Explorer *can* delete and rename the example (Q-encoded) file name.

As to why the file name got Q-encoded in the first place:

I suspect that at some time, the file name was used in some e-mail
header - probably in Subject; - and for some reason (see Apd's
responses), some mailer somewhere thought the file name should be
Q-encoded (perhaps indeed because of the underscore). Once encoded,
nobody should see the encoded form, only the decoded form. BUT if somone
would copy and paste the file name from the header, the clipboard would probably contain the encoded name, resulting in the havoc you
experienced.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Newyana2@21:1/5 to Frank Slootweg on Sun Sep 10 13:21:41 2023

"Frank Slootweg" <this@ddress.is.invalid> wrote

| You later mentioned that you use FAT32. The 'corrupt file names' issue
| is probably related to that, as on my (Windows 11) system, with NTFS,
| File Explorer *can* delete and rename the example (Q-encoded) file name.
|
Yes. Paul mentioned that NTFS is more sophisticated.
I didn't test to see what the file might have been named
on another system. There were 3 different issues: Spark
writing an unnecessary, arguably corrupted file name.
TBird apparently not properly parsing that name. Then
Windows allowing the name to be recorded differently
from what Explorer saw.

| As to why the file name got Q-encoded in the first place:
|
| I suspect that at some time, the file name was used in some e-mail
| header - probably in Subject; - and for some reason (see Apd's
| responses), some mailer somewhere thought the file name should be
| Q-encoded (perhaps indeed because of the underscore). Once encoded,
| nobody should see the encoded form, only the decoded form. BUT if somone
| would copy and paste the file name from the header, the clipboard would
| probably contain the encoded name, resulting in the havoc you
| experienced.

There was no excuse for the encoding, except that Spark
was composing in UTF-8 and it's "legal" to encode it. The
mystery is why TBird, or something, dropped out the underscore.
An underscore is a perfectly legit ASCII character. But somehow
Explorer ended up not showing it. That made me curious whether
there's a way to directly read the file system, but I'm not
aware of such a tool. I'm curious how the file was recorded.
Since File Assassin could delete it but enable me to rename or
delete, I'm guessing the name was corrupted between the file
system and Explorer. Maybe it was recorded as including an = sign,
for example.

I just sent myself an image with underscore and a space. It
came through normally:

Content-Type: image/jpeg;
name="_e-device spying.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="_e-device spying.jpg"

I'm curious how common this Q-encoding is. I've never
seen it before. I'd never heard of it. It's clearly "legal",
but I've even written email software and never saw such
a thing. There's no possible reason for it except to transmit
characters that don't exist in ASCII. Even then, it would
be converted on most systems. That is, if you send me
something like a file with a Chinese character then I'd
probably receive something like ~1/4.jpg if it worked at
all.

Which raises the question of unicode on Windows. Windows
has been unicode-16 for many years, but that's different
from UTF-8, using 2 bytes for all characters. I'm not sure
Explorer is capable, or Windows itself capable, of handling
a UTF-8 file name if there are characters not allowed in
Explorer.

It's funny how quickly character encoding gets confusing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Slootweg@21:1/5 to Newyana2@invalid.nospam on Sun Sep 10 20:05:25 2023

Newyana2 <Newyana2@invalid.nospam> wrote:

"Frank Slootweg" <this@ddress.is.invalid> wrote

[...]

| As to why the file name got Q-encoded in the first place:
|
| I suspect that at some time, the file name was used in some e-mail
| header - probably in Subject; - and for some reason (see Apd's
| responses), some mailer somewhere thought the file name should be
| Q-encoded (perhaps indeed because of the underscore). Once encoded,
| nobody should see the encoded form, only the decoded form. BUT if somone
| would copy and paste the file name from the header, the clipboard would
| probably contain the encoded name, resulting in the havoc you
| experienced.

There was no excuse for the encoding, except that Spark
was composing in UTF-8 and it's "legal" to encode it. The
mystery is why TBird, or something, dropped out the underscore.
An underscore is a perfectly legit ASCII character. But somehow
Explorer ended up not showing it. That made me curious whether
there's a way to directly read the file system, but I'm not
aware of such a tool. I'm curious how the file was recorded.
Since File Assassin could delete it but enable me to rename or
delete, I'm guessing the name was corrupted between the file
system and Explorer. Maybe it was recorded as including an = sign,
for example.

I just sent myself an image with underscore and a space. It
came through normally:

Content-Type: image/jpeg;
name="_e-device spying.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="_e-device spying.jpg"

I'm curious how common this Q-encoding is. I've never
seen it before. I'd never heard of it. It's clearly "legal",
but I've even written email software and never saw such
a thing. There's no possible reason for it except to transmit
characters that don't exist in ASCII. Even then, it would
be converted on most systems. That is, if you send me
something like a file with a Chinese character then I'd
probably receive something like ~1/4.jpg if it worked at
all.

As I said, the Q-encoding is relevant to and possibly justified in
e-mail *headers*, for example in 'Subject:'. A header must be ASCII,
because any MIME headers define the encoding and charset of the *body*,
not of the headers.

This is nicely explained in the MIME page you referenced:

<https://en.wikipedia.org/wiki/MIME#Encoded-Word>

and specifically the example in

<https://en.wikipedia.org/wiki/MIME#Difference_between_Q-encoding_and_quoted-printable>

So as I mentioned, my suspicion is that the Q-encoded file name was
probably in some header, probably the 'Subject:' header.

Remains the question, *why* it was Q-encoded as all the characters in
the file name are normal printing characters? But as Apd mentioned,
perhaps the underscore ('_') is a printable character, but still an
exception on some systems, so it was encoded, just to be on the safe
side.

Just for kicks, I used Thunderbird to send myself a message with
"Subject: IMG_0506.PNG", but when viewing the Message Source, I saw that
the name in the message was *not* encoded. So I could not confirm my
suspicion (but also not disprove it).

Which raises the question of unicode on Windows. Windows
has been unicode-16 for many years, but that's different
from UTF-8, using 2 bytes for all characters. I'm not sure
Explorer is capable, or Windows itself capable, of handling
a UTF-8 file name if there are characters not allowed in
Explorer.

It's funny how quickly character encoding gets confusing.

As your reference says, Q-encoding is similar to 'quoted-printable'.
That latter term was often qualified as 'quoted-unreadable'! :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Michal Wronka
  Sat Apr 27 20:56:24 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Sat Apr 27 20:21:24 2024
  from Wroclaw, Poland via SSH
- Bob Worm
  Sat Apr 27 15:58:57 2024
  from Wales, Uk via Telnet
- Bob Worm
  Sun Apr 28 16:00:25 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (0 / 16)
Uptime:	129:48:38
Calls:	6,663
Calls today:	1
Files:	12,212
Messages:	5,335,387

Curious email anomaly

Who's Online

Recent Visitors

System Info