Simply bumping the .so number and forcing a rebuild would certainly work. It would probably be the safest option but also put the highest cost on users, although I think for this kind of bug then a manual step of verifying the versions are correct isneeded so forcing a build failure isn’t a bad option. It would massively increase the visibility if this if nothing else.
Perhaps we should bring in the distro package maintainers here for their guidance/input? Like I say I’ve not had experience of this type of issue before but I’m sure the distros will have.members after this.
Ashley.
On 7 Mar 2024, at 16:05, Bernd Schubert <bernd.schubert@fastmail.fm> wrote: >>
Hi Ashley,
thanks for spotting and embarrassing as I had been involved in
development that feature.
Hard to decide if revert it right - issue is, if we revert, everything
linked with the new version would broken as well. And it is now already
a year...
I think we can only increase the .so version and send out a warning
(mailing list, distributions).
In order to avoid it in the future, we can probably add some compile
time asserts about struct member positions.
Bernd
On 3/7/24 16:43, Ashley Pittman wrote:
I’ve spotted an issue with the linked commit, the fuse_file_info struct should have been modified by adding new entries just before the padding, with this commit then members after the new entry will be moved creating a change in the ABI for
linked with.
https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
This affects the flush, nonseekable, flock_release, cache_readdir and noflush flags, each one of which could be set or cleared accidentally with one of the flags before or after it depending on what version of libfuse the application is compiled and
technical (we can simply change the code) however there’s no obvious way of enforcing or detecting this incompatibility at run-time so a fuse-users announcement will need to be made, and probably check with maintainers for various distros etc to see if
This isn’t a failure mode that I’ve experience before when using linux so I don’t have a playbook to work from in how to correct this but essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 to 3.16 have a different ABI.
I think the best course of action would be to revert to the previous ABI, move the entry to the correct place in the structure, make a new 3.17 release and then publicise that releases 3.13-3.16 should not be used. Of course this fix is part
headers had been used, this is far from perfect though as the last change to this structure was five years ago so the closest we could work out is if a binary was compiled using versions 3.8 to 3.16 inclusive, a start but not ideal. Of course if the
One thing we could and perhaps should do is add a extra placeholder to the fuse_lowlevel_ops struct, this would allow the library at runtime to compare the size passed to fuse_session_new() to at least detect if a potentially incompatible set of
this seriously.
Thoughts? Looking at the flags affected then there are several cases for surprising or unexpected behaviour and intended caching or data loss so I think this is going to be a big issue, at least theoretically if not in practice and we do need to take
Ashley.
Hi all,
this is certainly not kind of the mail I was hoping for as a new libfuse maintainer.
As you can see from the title and from discussion below (sorry this is
not typical ML discussion style), we have a bit of of problem with
libfuse ABI compatibility.
While scanning through git history, Ashley found a commit that was adding members in the middle of struct - doesn't break API, but binary compatibility. That commit already landed a year ago in these releases:
git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
fuse-3.14.1
fuse-3.15.0
fuse-3.15.1
fuse-3.16.1
fuse-3.16.2
Perhaps we should bring in the distro package maintainers here for their guidance/input? Like I say I’ve not had experience of this type of issue before but I’m sure the distros will have.
Hi all,
this is certainly not kind of the mail I was hoping for as a new libfuse maintainer.
As you can see from the title and from discussion below (sorry this is
not typical ML discussion style), we have a bit of of problem with
libfuse ABI compatibility.
members after this.On 3/7/24 16:43, Ashley Pittman wrote:
I’ve spotted an issue with the linked commit, the fuse_file_info struct should have been modified by adding new entries just before the padding, with this commit then members after the new entry will be moved creating a change in the ABI for
and linked with.
https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
This affects the flush, nonseekable, flock_release, cache_readdir and noflush flags, each one of which could be set or cleared accidentally with one of the flags before or after it depending on what version of libfuse the application is compiled
This isn’t a failure mode that I’ve experience before when using linux so I don’t have a playbook to work from in how to correct this but essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 to 3.16 have a different ABI.
On Thu, Mar 07, 2024 at 07:47:23PM +0100, Bernd Schubert wrote:members after this.
Hi all,
this is certainly not kind of the mail I was hoping for as a new libfuse
maintainer.
As you can see from the title and from discussion below (sorry this is
not typical ML discussion style), we have a bit of of problem with
libfuse ABI compatibility.
[...]
On 3/7/24 16:43, Ashley Pittman wrote:
I’ve spotted an issue with the linked commit, the fuse_file_info struct should have been modified by adding new entries just before the padding, with this commit then members after the new entry will be moved creating a change in the ABI for
and linked with.
https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
This affects the flush, nonseekable, flock_release, cache_readdir and noflush flags, each one of which could be set or cleared accidentally with one of the flags before or after it depending on what version of libfuse the application is compiled
This isn’t a failure mode that I’ve experience before when using linux so I don’t have a playbook to work from in how to correct this but essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 to 3.16 have a different ABI.
Not strictly related to the change at hand, but since we're
discussing recent-ish changes that negatively affected backwards compatibility in libfuse I will mention this too:
https://bugs.debian.org/1031802
It has been reported downstream a year ago, but I'm not sure if it
ever got upstream's attention. Now it should have :)
On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
Hi all,
this is certainly not kind of the mail I was hoping for as a new libfuse
maintainer.
As you can see from the title and from discussion below (sorry this is
not typical ML discussion style), we have a bit of of problem with
libfuse ABI compatibility.
While scanning through git history, Ashley found a commit that was adding
members in the middle of struct - doesn't break API, but binary
compatibility. That commit already landed a year ago in these releases:
git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
fuse-3.14.1
fuse-3.15.0
fuse-3.15.1
fuse-3.16.1
fuse-3.16.2
Obviously this needs improved testing for, but right now we wonder what
would be the preferred action to avoid issues.
a) We could fix it and move bits to the right place. Fixes everything
compiled with old versions, but breaks everything compiled with the
versions above
b) Increase the .so version - enforces recompilation of all binaries.
Intrusive, especially for distributions, but probably safest.
c) Other ideas?
Heuristically, you can detect most of the shifted flags at runtime
because...
I don't think there is anything in libfuse that would allow us to
detect which version of libfuse a library was linked to.
I think we do know for sure if fs was linked with libfuse < 3.12
without fuse_loop_mt_312?
so only left to detect 3.12..3.14 vs. 3.14..3.16
The commit shifted these members in struct fuse_file_info {
struct fuse_file_info {
...
/** Can be filled by open/create, to allow parallel direct writes on this
* file */
unsigned int parallel_direct_writes : 1; --> introduced the shift
Not expected in flush/release, so can be heuristically interpreted as flush
/** Indicates a flush operation. Set in flush operation, also
maybe set in highlevel lock operation and lowlevel release
operation. */
unsigned int flush : 1;
Not expected in open/create, so can be heuristically interpreted as nonseekable
/** Can be filled in by open, to indicate that the file is not
seekable. */
unsigned int nonseekable : 1;
Not expected in release, so can be heuristically interpreted as flock_release
/* Indicates that flock locks for this file should be
released. If set, lock_owner shall contain a valid value.
May only be set in ->release(). */
unsigned int flock_release : 1;
Not expected in opendir, so can be heuristically interpreted as cache_readdir
/** Can be filled in by opendir. It signals the kernel to
enable caching of entries returned by readdir(). Has no
effect when set in other contexts (in particular it does
nothing when set by open()). */
unsigned int cache_readdir : 1;
Ignored in open, but based on the comment above, it may be
implied that some fs may set it in open() reply
/** Can be filled in by open, to indicate that flush is not needed >> on close. */
unsigned int noflush : 1;
noflush is just an optimization, which the kernel ignores anyway
when writeback cache is enabled, so no harm done if it is wrongly
interpreted as readdir_cache in open() and ignored.
It is also quite recent (3.11) so not very likely used.
};
I.e. setting flush would actually set parallel_direct_writes
(note: with binaries compiled against older libfuse versions)
It would be suspicious if fs sets parallel_direct_writes flush at
flush() or release(), so that can be used as a hint of old ABI
interpreted as flush and issue a warning.
It would be pretty ugly to use these heuristics in the library forever,
but perhaps as safety measure for binaries built with a range of
libfuse version that is not likely to be found in the wild (3.14..3.16)
it is an acceptable compromise?
and perhaps the next libfuse version can pass the header version
fs was compiled with as argument to fuse_loop_mt_317()?
On 9 Mar 2024, at 02:46, Amir Goldstein <amir73il@gmail.com> wrote:
On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
Hi all,
this is certainly not kind of the mail I was hoping for as a new libfuse
maintainer.
As you can see from the title and from discussion below (sorry this is
not typical ML discussion style), we have a bit of of problem with
libfuse ABI compatibility.
While scanning through git history, Ashley found a commit that was adding
members in the middle of struct - doesn't break API, but binary
compatibility. That commit already landed a year ago in these releases:
git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
fuse-3.14.1
fuse-3.15.0
fuse-3.15.1
fuse-3.16.1
fuse-3.16.2
Obviously this needs improved testing for, but right now we wonder what
would be the preferred action to avoid issues.
a) We could fix it and move bits to the right place. Fixes everything
compiled with old versions, but breaks everything compiled with the
versions above
b) Increase the .so version - enforces recompilation of all binaries.
Intrusive, especially for distributions, but probably safest.
c) Other ideas?
Heuristically, you can detect most of the shifted flags at runtime
because...
I don't think there is anything in libfuse that would allow us to
detect which version of libfuse a library was linked to.
I think we do know for sure if fs was linked with libfuse < 3.12
without fuse_loop_mt_312?
so only left to detect 3.12..3.14 vs. 3.14..3.16
The commit shifted these members in struct fuse_file_info {
struct fuse_file_info {
...
/** Can be filled by open/create, to allow parallel direct writes on this
* file */
unsigned int parallel_direct_writes : 1; --> introduced the shift
Not expected in flush/release, so can be heuristically interpreted as flush
/** Indicates a flush operation. Set in flush operation, also
maybe set in highlevel lock operation and lowlevel release
operation. */
unsigned int flush : 1;
Not expected in open/create, so can be heuristically interpreted as nonseekable
/** Can be filled in by open, to indicate that the file is not
seekable. */
unsigned int nonseekable : 1;
Not expected in release, so can be heuristically interpreted as flock_release
/* Indicates that flock locks for this file should be
released. If set, lock_owner shall contain a valid value.
May only be set in ->release(). */
unsigned int flock_release : 1;
Not expected in opendir, so can be heuristically interpreted as cache_readdir
/** Can be filled in by opendir. It signals the kernel to
enable caching of entries returned by readdir(). Has no
effect when set in other contexts (in particular it does
nothing when set by open()). */
unsigned int cache_readdir : 1;
Ignored in open, but based on the comment above, it may be
implied that some fs may set it in open() reply
/** Can be filled in by open, to indicate that flush is not needed
on close. */
unsigned int noflush : 1;
noflush is just an optimization, which the kernel ignores anyway
when writeback cache is enabled, so no harm done if it is wrongly
interpreted as readdir_cache in open() and ignored.
It is also quite recent (3.11) so not very likely used.
};
I.e. setting flush would actually set parallel_direct_writes
(note: with binaries compiled against older libfuse versions)
It would be suspicious if fs sets parallel_direct_writes flush at
flush() or release(), so that can be used as a hint of old ABI
interpreted as flush and issue a warning.
It would be pretty ugly to use these heuristics in the library forever,
but perhaps as safety measure for binaries built with a range of
libfuse version that is not likely to be found in the wild (3.14..3.16)
it is an acceptable compromise?
and perhaps the next libfuse version can pass the header version
fs was compiled with as argument to fuse_loop_mt_317()?
Thanks,
Amir.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 360 |
Nodes: | 16 (2 / 14) |
Uptime: | 128:26:22 |
Calls: | 7,686 |
Files: | 12,828 |
Messages: | 5,711,088 |