Forum: >>> Magnum BBS <<<

[RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken

From Mimi Zohar@21:1/5 to Dave Chinner on Mon Oct 2 14:20:02 2017

On Mon, 2017-10-02 at 15:35 +1100, Dave Chinner wrote:

On Sun, Oct 01, 2017 at 07:42:42PM -0400, Mimi Zohar wrote:

On Mon, 2017-10-02 at 09:34 +1100, Dave Chinner wrote:

On Sun, Oct 01, 2017 at 11:41:48AM -0700, Linus Torvalds wrote:

On Sun, Oct 1, 2017 at 5:08 AM, Mimi Zohar <zohar@linux.vnet.ibm.com> wrote:

Right, re-introducing the iint->mutex and a new i_generation field in the iint struct with a separate set of locks should work. It will be reset if the file metadata changes (eg. setxattr, chown, chmod).

Note that the "inner lock" could possibly be omitted if the invalidation can be just a single atomic instruction.

So particularly if invalidation could be just an atomic_inc() on the generation count, there might not need to be any inner lock at all.

You'd have to serialize the actual measurement with the "read generation count", but that should be as simple as just doing a smp_rmb() between the "read generation count" and "do measurement on file contents".

We already have a change counter on the inode, which is modified on
any data or metadata write (i_version) under filesystem locks. The i_version counter has well defined semantics - it's required by
NFSv4 to increment on any metadata or data change - so we should be
able to rely on it's behaviour to implement IMA as well. Filesystems
that support i_version are marked with [SB|MS]_I_VERSION in the superblock (IS_I_VERSION(inode)) so it should be easy to tell if IMA
can be supported on a specific filesystem (btrfs, ext4, fuse and xfs ATM).

Recently I received a patch to replace i_version with mtime/atime.

mtime is not guaranteed to change on data writes - the resolution of
the filesystem timestamps may mean mtime only changes once a second regardless of the number of writes performed to that file. That's
why NFS can't use it as a change attribute, and hence we have
i_version....

Now, even more recently, I received a patch that claims that
i_version is just a performance improvement.

Did you ask them to explain/quantify the performance improvement?

Using i_version is a performance improvement as opposed to always
calculating the file hash and writing the xattr. The patch is
intended for filesystems that don't support i_version (eg. ubifs).

e.g. Using i_version on XFS slows down performance on small
writes by 2-3% because i_version because all data writes log a
version change rather than only logging a change when mtime updates.
We take that penalty because NFS requires specific change attribute behaviour, otherwise we wouldn't have implemented it at all in
XFS...

For file systems that
don't support i_version, assume that the file has changed.

For file systems that don't support i_version, instead of assuming
that the file has changed, we can at least use i_generation.

I'm not sure what you mean here - the struct inode already has a
i_generation variable. It's a lifecycle indicator used to
discriminate between alloc/free cycles on the same inode number.
i.e. It only changes at inode allocation time, not whenever the data
in the inode changes...

Sigh, my error.

With Linus' suggested changes, I think this will work nicely.

The IMA code should be able to sample that at measurement time and
either fail or be retried if i_version changes during measurement.
We can then simply make the IMA xattr write conditional on the
i_version value being unchanged from the sample the IMA code passes
into the filesystem once the filesystem holds all the locks it needs
to write the xattr...

I note that IMA already grabs the i_version in
ima_collect_measurement(), so this shouldn't be too hard to do.
Perhaps we don't need any new locks or counterst all, maybe just
the ability to feed a version cookie to the set_xattr method?

The security.ima xattr is normally written out in
ima_check_last_writer(), not in ima_collect_measurement().

Which, if IIUC, does this to measure and update the xattr:

ima_check_last_writer
-> ima_update_xattr
-> ima_collect_measurement
-> ima_fix_xattr

ima_collect_measurement() calculates the file hash for storing in the measurement list (IMA-measurement), verifying the hash/signature (IMA- appraisal) already stored in the xattr, and auditing (IMA-audit).

Yup, and it samples the i_version before it calculates the hash and
stores it in the iint, which then gets passed to ima_fix_xattr().
Looks like all that is needed is to pass the i_version back to the
filesystem through the xattr call....

IOWs, sample the i_version early while we hold the inode lock and
check the writer count, then if it is the last writer drop the inode
lock and call ima_update_xattr(). The sampled i_version then tells
us if the file has changed before we write the updated xattr...

The only time that ima_collect_measurement() writes the file xattr is
in "fix" mode. Writing the xattr will need to be deferred until after
the iint->mutex is released.

ima_collect_measurement() doesn't write an xattr at all - it just
reads the file data and calculates the hash.

There's another call to ima_fix_xattr() from ima_appraise_measurement().

There should be no open writers in ima_check_last_writer(), so the
file shouldn't be changing.

If that code is not holding the inode i_rwsem across
ima_update_xattr(), then the writer check is racy as hell. We're
trying to get rid of the need for this code to hold the inode lock
to stabilise the writer count for the entire operation, and it looks
to me like everything is there to use the i_version to ensure the
the IMA code doesn't need to hold the inode lock across ima_collect_measurement() and ima_fix_xattr()...

Ok

Mimi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mimi Zohar@21:1/5 to Eric W. Biederman on Mon Oct 2 14:30:01 2017

On Sun, 2017-10-01 at 22:25 -0500, Eric W. Biederman wrote:

Mimi Zohar <zohar@linux.vnet.ibm.com> writes:

There should be no open writers in ima_check_last_writer(), so the
file shouldn't be changing.

This is slightly tangential but I think important to consider.
What do you do about distributed filesystems fuse, nfs, etc that
can change the data behind the kernels back.

Exactly!

Do you not support such systems or do you have a sufficient way to
detect changes?

Currently, only the initial file access in policy is measured,
verified, audited. Even if there was a way of detecting the change,
since we can't trust these file systems, the performance would be
awful, but we should probably not be caching the
measurement/verification results.

Mimi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Layton@21:1/5 to Mimi Zohar on Mon Oct 2 14:50:03 2017

On Mon, 2017-10-02 at 08:09 -0400, Mimi Zohar wrote:

On Mon, 2017-10-02 at 15:35 +1100, Dave Chinner wrote:

On Sun, Oct 01, 2017 at 07:42:42PM -0400, Mimi Zohar wrote:

On Mon, 2017-10-02 at 09:34 +1100, Dave Chinner wrote:

On Sun, Oct 01, 2017 at 11:41:48AM -0700, Linus Torvalds wrote:

On Sun, Oct 1, 2017 at 5:08 AM, Mimi Zohar <zohar@linux.vnet. ibm.com> wrote:

Right, re-introducing the iint->mutex and a new
i_generation field in
the iint struct with a separate set of locks should
work. It will be
reset if the file metadata changes (eg. setxattr, chown,
chmod).

Note that the "inner lock" could possibly be omitted if the invalidation can be just a single atomic instruction.

So particularly if invalidation could be just an atomic_inc()
on the
generation count, there might not need to be any inner lock
at all.

You'd have to serialize the actual measurement with the "read generation count", but that should be as simple as just doing
a
smp_rmb() between the "read generation count" and "do
measurement on
file contents".

We already have a change counter on the inode, which is
modified on
any data or metadata write (i_version) under filesystem
locks. The
i_version counter has well defined semantics - it's required by
NFSv4 to increment on any metadata or data change - so we
should be
able to rely on it's behaviour to implement IMA as well.
Filesystems
that support i_version are marked with [SB|MS]_I_VERSION in the superblock (IS_I_VERSION(inode)) so it should be easy to tell
if IMA
can be supported on a specific filesystem (btrfs, ext4, fuse
and xfs
ATM).

Recently I received a patch to replace i_version with
mtime/atime.

I assume you're talking here about the patch I sent a few months ago.

I specifically do _not_ want to replace i_version with the mtime/atime.
The point there was to stop trying to use i_version on filesystems that
don't properly implement it (which is most of them).

The next best approximation on those filesystems is the mtime. It's not perfect, but it's better than nothing (which is what you have now on filesystems that never increment i_version on writes). IOW, it just
added a fallback for when you can't count on the i_version changing.

(BTW: atime is worthless here -- who cares if the thing was accessed?
IIUC, we only care if something changed.)

Ideally, all filesystems would implement i_version properly. In
practice, that's a tall order as that may require on-disk changes for
some of them. That's not always possible where cross-OS compatibility
is necessary (e.g. FAT or NTFS).

mtime is not guaranteed to change on data writes - the resolution
of
the filesystem timestamps may mean mtime only changes once a second regardless of the number of writes performed to that file. That's
why NFS can't use it as a change attribute, and hence we have
i_version....

Now, even more recently, I received a patch that claims that
i_version is just a performance improvement.

Did you ask them to explain/quantify the performance improvement?

Using i_version is a performance improvement as opposed to always
calculating the file hash and writing the xattr. The patch is
intended for filesystems that don't support i_version (eg. ubifs).

e.g. Using i_version on XFS slows down performance on small
writes by 2-3% because i_version because all data writes log a
version change rather than only logging a change when mtime
updates.
We take that penalty because NFS requires specific change attribute behaviour, otherwise we wouldn't have implemented it at all in
XFS...

For file systems that
don't support i_version, assume that the file has changed.

For file systems that don't support i_version, instead of
assuming
that the file has changed, we can at least use i_generation.

I'm not sure what you mean here - the struct inode already has a i_generation variable. It's a lifecycle indicator used to
discriminate between alloc/free cycles on the same inode number.
i.e. It only changes at inode allocation time, not whenever the
data
in the inode changes...

Sigh, my error.

With Linus' suggested changes, I think this will work nicely.

The IMA code should be able to sample that at measurement time
and
either fail or be retried if i_version changes during
measurement.
We can then simply make the IMA xattr write conditional on the i_version value being unchanged from the sample the IMA code
passes
into the filesystem once the filesystem holds all the locks it
needs
to write the xattr...
I note that IMA already grabs the i_version in ima_collect_measurement(), so this shouldn't be too hard to do.
Perhaps we don't need any new locks or counterst all, maybe
just
the ability to feed a version cookie to the set_xattr method?

The security.ima xattr is normally written out in ima_check_last_writer(), not in ima_collect_measurement().

Which, if IIUC, does this to measure and update the xattr:

ima_check_last_writer
-> ima_update_xattr
-> ima_collect_measurement
-> ima_fix_xattr

ima_collect_measurement() calculates the file hash for storing
in the
measurement list (IMA-measurement), verifying the hash/signature
(IMA-
appraisal) already stored in the xattr, and auditing (IMA-audit).

Yup, and it samples the i_version before it calculates the hash and
stores it in the iint, which then gets passed to ima_fix_xattr().
Looks like all that is needed is to pass the i_version back to the filesystem through the xattr call....

IOWs, sample the i_version early while we hold the inode lock and
check the writer count, then if it is the last writer drop the
inode
lock and call ima_update_xattr(). The sampled i_version then tells
us if the file has changed before we write the updated xattr...

The only time that ima_collect_measurement() writes the file
xattr is
in "fix" mode. Writing the xattr will need to be deferred until
after
the iint->mutex is released.

ima_collect_measurement() doesn't write an xattr at all - it just
reads the file data and calculates the hash.

There's another call to ima_fix_xattr() from
ima_appraise_measurement().

There should be no open writers in ima_check_last_writer(), so
the
file shouldn't be changing.

If that code is not holding the inode i_rwsem across
ima_update_xattr(), then the writer check is racy as hell. We're
trying to get rid of the need for this code to hold the inode lock
to stabilise the writer count for the entire operation, and it
looks
to me like everything is there to use the i_version to ensure the
the IMA code doesn't need to hold the inode lock across ima_collect_measurement() and ima_fix_xattr()...

Ok

Mimi

--
Jeff Layton <jlayton@redhat.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Michal Wronka
  Wed Apr 24 14:13:57 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Wed Apr 24 14:02:51 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Thu Apr 25 14:02:21 2024
  from Wroclaw, Poland via SSH
- Bob Worm
  Thu Apr 25 11:52:12 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	53:55:42
Calls:	6,650
Calls today:	2
Files:	12,200
Messages:	5,330,612

[RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken

Who's Online

Recent Visitors

System Info