I gather that capabilities are generally fine-grained, and capability pointers would be generated and handed out by the OS. What happens when
a pointer is incremented?
There is a project called CHERI, whose concepts have been implemented
in Arm's _Morello_ chip.
I have been in CPU design for a very long time. I did a HS level
design (calculator) in 1968 3 years before the Bomar Brain, did
a #60 design in college as a Jr, and started doing professional
designs (Mc 88100) in 1983.
With all this background and long term in this career, I can say
without a trace of doubt, that I am not <yet> smart enough to do
a capabilities ISA/system and get it out the door without errors.
On the other hand, My 66000 Architecture is immune to most attack
strategies now in vogue:: Return Oriented Programming, RowHammer,
Spectré, GOT overwrites, Buffer Overflows,... All without having
any semblance of capabilities; and all without any performance
degradations other than typical cache and TLB effects.
... the Burroughs Large systems capability based design ...
Capabilities sounds like something previously implemented in mainframe
class computers.
IBM_s System/38 and follow-on AS/400 (both long obsolete) may have
had something like them. Not sure if they count as
_mainframe-class_.
On Sat, 09 Mar 2024 15:09:46 GMT, Scott Lurndal wrote:
... the Burroughs Large systems capability based design ...
As I recall, that depended on not giving users access to a compiler that >could generate instructions that bypassed the protection system.
On 3/9/2024 9:09 AM, Scott Lurndal wrote:
There [are] CHERI designs on silicon in existence
https://www.morello-project.org/
It is doable, at least.
Main open question is if they can deliver enough on their claims in a
way that justifies the cost of the memory tagging (eg, where one needs
to tag whether or not each memory location holds a valid capability).
As I see it, "locking things down" would likely require turning things
like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
(and generally giving the kernel a much more active role in this process).
On 3/9/2024 1:58 PM, Robert Finch wrote:<snip>
On 2024-03-09 1:56 p.m., BGB wrote:
On 3/9/2024 9:09 AM, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
For Femtiki OS, I have a single object describing an array of values.
For instance messages which are small objects, are described with a
single object for an array of messages. It is too costly to use an
object descriptor for each message.
For a CHERI like approach, one would need a tag of 1 bit for every 16
bytes of RAM (to flag whether or not that RAM represents a valid
capability).
For the combination of RAM sizes and FPGAs I have access to, this is non-viable, as I would need more BRAM for the tag memory than exists in
the FPGAs.
In effect, this will mean needing another smaller cache which is bolted
onto the L2 cache or similar, whose sole purpose is to provide tag-bits
(and probably bounce requests to some other area of RAM which contains
the tag-bits memory).
I think this may not be necessary, but I have to read some more. The
As I see it, "locking things down" would likely require turning things
like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
(and generally giving the kernel a much more active role in this
process).
capabilities have transfer rules which might make it possible to use
existing code. They have ported things over to Riscv. It cannot be too
mountainous a task.
You can make it work, yes, but the question is less "can you make it
work, technically", but more:
Can you make it work in a way that provides both a fairly normal C experience, and *also* an unbreakable sandbox, at the same time.
My skepticism here is that, short of drastic measures like moving malloc
and libdl and similar into kernel space, it may not be possible to keep
the sandbox secure using solely capabilities.
ASLR could help, but using ASLR to maintain an image of integrity for
the capability system would be "kinda weak".
One could ask though:
How is my security model (with Keyrings) any different?
Well, the partial answer mostly is that a call that switches keyrings is effectively accomplished via context switches (with the two keyrings effectively running in separate threads).
So, like, even if the untrusted thread has a pointer to the protected thread's memory, it can't access it...
Though, a similar model could potentially be accomplished with
conventional page-tables, by making pseudo-processes which only share
parts of their address space with another process (and the protected
memory is located in the non-shared spaces, with any calls between them
via an RPC mechanism).
Had considered mechanisms which could pull this off without a context
switch, but most would fall short of "acceptably secure" (if a path
exists where a task could modify its own KRR or similar, this mechanism
is blown).
My bounds-checking scheme also worked, but with a caveat:
It only works if code does not get "overly clever" with the use of pointers.
So, it worked well enough to where I was able to enable full
bounds-checking in Doom and similar, but was not entirely transparent to
some of the runtime code. If you cast between pointers and integers, and manipulate the pointer bits, there are "gotchas".
A full capability system is going to have a similar restriction.
Either pointer<->integer casting would need to be disallowed, or (more likely), turned into a runtime call which can "bless" the address before returning it as a capability, which would exist as another potential
attack surface (unless, of course, this mechanism is itself turned into
a system call).
OTOH:
If one can't implement something like a conventional JavaScript VM, or
if it takes a significant performance hit, this would not be ideal.
Or, change the description, as being mostly a tool to eliminate thingsYeah, IMO explicit upper and lower bounds would be better even though it
like buffer overflow exploits and memory corruption, and as a fairly
powerful debugging feature.
But, say, note that it would not be sufficient, say, for things like
sandboxing hostile code within a shared address space with another
program that needs to be kept protected.
Granted, the strength could likely be improved (in the face of trying
to prevent hostile code from being able to steal capabilities) through
creative use of ASLR. Along with ABI features, such as "scratch
register scrubbing" (say, loading zeroes into scratch registers on
function return, such as to prevent capabilities from being leaked
through via registers), marking function pointers as "Execute Only" etc. >>>
As noted, a capability system would likely still be pretty strong
against things like buffer overflows (but if only being used to
mitigate buffer overflows, is a bit overkill; so the main
"interesting" case is if it can be used to make an "unbreakable
sandbox" for potentially hostile machine code).
*: If it is possible to perform a Load or (worse, Capability Load)
through a function pointer, this is likely to be a significant attack
vector. Need to make it so that function pointers can only be used to
call things. Protecting against normal data loads would be needed
mostly to try to prevent code from being able to gain access to a
known pointer and possibly side-step the ASLR (say, if it can figure
out that the address it wants to access is reachable from a capability
that the code has access to).
Though, on my side of things, it is possible I could revive a modified
form of the 128-bit ABI, while dropping the VAS back down to 48 bits,
and turn it into a more CHERI-like form (with explicit upper and lower
bounds and access-enable flags, rather than a shared-exponent size and
bias scheme).
uses more memory. The whole manipulation of the bounds is complex. I
sketched out using a 256b capability descriptor. Some of the bits can be
trimmed from the bounds if things are page aligned.
IIRC, they were using 128-bit descriptors with a bit-slicing scheme.
So, say, if I were to do similar (within my existing pointer layout):
( 27: 0): Base Address
( 47: 28): Shared Address (47:28)
( 63: 48): Type Tag Bits
( 87: 64): Lower Bound (27:4)
(111: 88): Upper Bound (27:4)
( 112): Base Adjust
( 113): Lower Bound Adjust
( 114): Upper Bound Adjust
(127:115): Access Flags / Etc
Though, this particular encoding would limit bounds-checking to a 256MB region, which is lame (or eat more tag bits, and have slightly bigger regions).
There CHERI designs on silicon in existence
https://www.morello-project.org/
Main open question is if they can deliver enough on their claims in a
way that justifies the cost of the memory tagging (eg, where one needs
to tag whether or not each memory location holds a valid capability).
As I see it, "locking things down" would likely require turning things
like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
(and generally giving the kernel a much more active role in this process).
But, say, note that it would not be sufficient, say, for things like sandboxing hostile code within a shared address space with another
program that needs to be kept protected.
Granted, the strength could likely be improved (in the face of trying to prevent hostile code from being able to steal capabilities) through
creative use of ASLR. Along with ABI features, such as "scratch register scrubbing" (say, loading zeroes into scratch registers on function
return, such as to prevent capabilities from being leaked through via registers), marking function pointers as "Execute Only" etc.
As noted, a capability system would likely still be pretty strong
against things like buffer overflows (but if only being used to mitigate buffer overflows, is a bit overkill; so the main "interesting" case is
if it can be used to make an "unbreakable sandbox" for potentially
hostile machine code).
*: If it is possible to perform a Load or (worse, Capability Load)
through a function pointer, this is likely to be a significant attack
vector. Need to make it so that function pointers can only be used to
call things. Protecting against normal data loads would be needed mostly
to try to prevent code from being able to gain access to a known pointer
and possibly side-step the ASLR (say, if it can figure out that the
address it wants to access is reachable from a capability that the code
has access to).
It is likely that the capability memory tagging would need to be managed
by the L2 cache. Would need some mechanism for the tag-bits memory (say,
2MB for 256MB at 1b per 16B line). Would also need to somehow work this
flag bit into the ringbus messaging.
The C experience is fairly normal, as long as you are actually
playing by the C rules. You can't arbitraily cast integers to
pointers - if you plan to do that you need to use intptr_t so
the compiler knows to keep the data in a capability so it can
use it as a pointer later.
Tricks which store data in the upper or lower bits of pointers are
awkward.
Changes in a 6M LoC KDE desktop codebase were 0.026% of lines: https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace 466297f20a0dbd22d371.pdf
Sandboxing involves dividing code into compartments; that involves
some decision making as to where you draw the security boundaries.
There aren't good tools to do that (they are being worked on).
CHERI offers you the tools to implement whatever compartmentalisation
stategy you wish, but it's not quite as simple as just recompiling.
... we're running on FreeBSD
BGB wrote:
On 3/9/2024 1:58 PM, Robert Finch wrote:<snip>
On 2024-03-09 1:56 p.m., BGB wrote:
On 3/9/2024 9:09 AM, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
For Femtiki OS, I have a single object describing an array of values.
For instance messages which are small objects, are described with a
single object for an array of messages. It is too costly to use an
object descriptor for each message.
For a CHERI like approach, one would need a tag of 1 bit for every 16
bytes of RAM (to flag whether or not that RAM represents a valid capability).
For the combination of RAM sizes and FPGAs I have access to, this is non-viable, as I would need more BRAM for the tag memory than exists in
the FPGAs.
Yes, indeed, not viable. Now imagine a page of those, and now you have
to write out 4096 bytes and 2048-tag-bits onto a disk with standard sectors......
In effect, this will mean needing another smaller cache which is bolted onto the L2 cache or similar, whose sole purpose is to provide tag-bits (and probably bounce requests to some other area of RAM which contains
the tag-bits memory).
Denelcor HEP had tag-like-bits and all the crud they bring (but they were used as locks instead of tags).
I think this may not be necessary, but I have to read some more. The
As I see it, "locking things down" would likely require turning things >>> like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls >>> (and generally giving the kernel a much more active role in this
process).
capabilities have transfer rules which might make it possible to use
existing code. They have ported things over to Riscv. It cannot be too
mountainous a task.
You can make it work, yes, but the question is less "can you make it
work, technically", but more:
Can you make it work in a way that provides both a fairly normal C experience, and *also* an unbreakable sandbox, at the same time.
And here the answer is essentially <wait for it> no.
My skepticism here is that, short of drastic measures like moving malloc and libdl and similar into kernel space, it may not be possible to keep
the sandbox secure using solely capabilities.
ASLR could help, but using ASLR to maintain an image of integrity for
the capability system would be "kinda weak".
How do you ALSR code when a latent capability on disk still points at
its defined memory area ? Yes, you can ALSR at boot, but you can use
the file system to hold capabilities {which is something most capability systems desire and promote.}
One could ask though:
How is my security model (with Keyrings) any different?
Well, the partial answer mostly is that a call that switches keyrings is effectively accomplished via context switches (with the two keyrings effectively running in separate threads).
So, like, even if the untrusted thread has a pointer to the protected thread's memory, it can't access it...
Though, a similar model could potentially be accomplished with
conventional page-tables, by making pseudo-processes which only share
parts of their address space with another process (and the protected
memory is located in the non-shared spaces, with any calls between them
via an RPC mechanism).
Capability manipulation via messages.
Had considered mechanisms which could pull this off without a context switch, but most would fall short of "acceptably secure" (if a path
exists where a task could modify its own KRR or similar, this mechanism
is blown).
My bounds-checking scheme also worked, but with a caveat:
It only works if code does not get "overly clever" with the use of pointers.
Which no-one can trust of C programs.
So, it worked well enough to where I was able to enable full bounds-checking in Doom and similar, but was not entirely transparent to some of the runtime code. If you cast between pointers and integers, and manipulate the pointer bits, there are "gotchas".
Gee, if only we had trained programmers to avoid some of the things we
are now requiring new languages to prevent.....
Either pointer<->integer casting would need to be disallowed, or (more likely), turned into a runtime call which can "bless" the address before returning it as a capability, which would exist as another potential
attack surface (unless, of course, this mechanism is itself turned into
a system call).
OTOH:
If one can't implement something like a conventional JavaScript VM, or
if it takes a significant performance hit, this would not be ideal.
Going for 2 in one post !!
Though, on my side of things, it is possible I could revive a modified >>> form of the 128-bit ABI, while dropping the VAS back down to 48 bits,Yeah, IMO explicit upper and lower bounds would be better even though it >> uses more memory. The whole manipulation of the bounds is complex. I
and turn it into a more CHERI-like form (with explicit upper and lower >>> bounds and access-enable flags, rather than a shared-exponent size and >>> bias scheme).
sketched out using a 256b capability descriptor. Some of the bits can be >> trimmed from the bounds if things are page aligned.
MitchAlsup1 <mitchalsup@aol.com> wrote:
BGB wrote:
<snip>
You can make it work, yes, but the question is less "can you make it
work, technically", but more:
Can you make it work in a way that provides both a fairly normal C
experience, and *also* an unbreakable sandbox, at the same time.
The C experience is fairly normal, as long as you are actually playing by
the C rules. You can't arbitraily cast integers to pointers - if you plan
to do that you need to use intptr_t so the compiler knows to keep the data
in a capability so it can use it as a pointer later.
Tricks which store data in the upper or lower bits of pointers are awkward.
Other tricks like XOR linked lists of pointers don't work.
This is all
stuff that's pushing into the 'undefined behaviour' parts of C (even if C doesn't explicitly call it out).
Why would you want to ASLR? ASLR is to prevent you guessing valid addresses for things so you can't craft pointers to them. CHERI prevents you crafting pointers to arbitrary things in the first place.
In article <Qry*XE3Ez@news.chiark.greenend.org.uk>, theom+news@chiark.greenend.org.uk (Theo Markettos) wrote:
The C experience is fairly normal, as long as you are actually
playing by the C rules. You can't arbitraily cast integers to
pointers - if you plan to do that you need to use intptr_t so
the compiler knows to keep the data in a capability so it can
use it as a pointer later.
Makes sense, though it will require updating of older code for the rules being more thoroughly enforced. Not a bad thing.
Tricks which store data in the upper or lower bits of pointers are
awkward.
Not compatible with Aarch64 Pointer Authentication, but CHERI should be a functional replacement anyway.
Changes in a 6M LoC KDE desktop codebase were 0.026% of lines: https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace 466297f20a0dbd22d371.pdf
15,000 or so changes. Quite a lot. Is the code backwards-compatible to a conventional C platform?
Sandboxing involves dividing code into compartments; that involves
some decision making as to where you draw the security boundaries.
There aren't good tools to do that (they are being worked on).
CHERI offers you the tools to implement whatever compartmentalisation stategy you wish, but it's not quite as simple as just recompiling.
I have a slightly odd case: the software I work on ships as a great big shared library that's used in-process by its caller. It isn't any kind of server, and doesn't use any IPC; in concept it's a huge math library that asks the caller to allocate memory for it. So it needs to share a heap
with the caller. Presumably that model is workable?
... we're running on FreeBSD
That was a point against my experimenting with Morello when we were
offered it last year; the requirement to port to FreeBSD first. Morello
Linux seems insufficiently mature at present; do you have any idea of the timescale for it to be robustly usable for porting application code by someone who isn't experienced in Linux internals?
Theo Markettos wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
BGB wrote:
<snip>
You can make it work, yes, but the question is less "can you make it
work, technically", but more:
Can you make it work in a way that provides both a fairly normal C
experience, and *also* an unbreakable sandbox, at the same time.
The C experience is fairly normal, as long as you are actually playing by the C rules. You can't arbitraily cast integers to pointers - if you plan to do that you need to use intptr_t so the compiler knows to keep the data in a capability so it can use it as a pointer later.
As a 'for instance' how does one take a capability and align it to a cache line boundary ?? Say in/after malloc() ?!?
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
BGB wrote:
<snip>
You can make it work, yes, but the question is less "can you make it
work, technically", but more:
Can you make it work in a way that provides both a fairly normal C
experience, and *also* an unbreakable sandbox, at the same time.
The C experience is fairly normal, as long as you are actually playing by >> > the C rules. You can't arbitraily cast integers to pointers - if you plan >> > to do that you need to use intptr_t so the compiler knows to keep the data >> > in a capability so it can use it as a pointer later.
As a 'for instance' how does one take a capability and align it to a cache >> line boundary ?? Say in/after malloc() ?!?
I'm not sure what you mean:
Capabilities are 128-bit fields stored aligned in memory. It's not allowed to store a capability that isn't 128-bit aligned. Those naturally align
with cache lines. Every 128 bits has a tag associated with it, stored together or apart (various schemes discussed in my previous posts).
The memory it points to can be arbitraily aligned.
It is just a 64-bit pointer. You dereference it using 8/16/32/64/128 bit load and store instructions in the usual datapath (either explicitly using 'C load'/'C store' instructions or switching to a mode where every regular load/store implicitly dereferences a capability rather than integer pointer)
The bounds have a certain representation limits, because they're packing
192+ bits of information into a 128 bit space. This boils down to an alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I can't remember what the rounding is at this size). malloc() should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab allocators which round up the allocation to a certain number of slabs.
There is a trickiness if somebody wants to generate a capability to a subobject in the middle of a large object that isn't aligned: load in a 4.7GiB DVD wholesale into memory and try to generate a capability to a block of frames in the middle of it, which is potentially large and yet the base
is unaligned, which would cause a loss of bounds precision (somebody could access the frame before or after). It's possible to imagine things like that, but we've not seen software actually do it.
I'm not sure how any of these relate to cache lines?
Aside for ensuring the caches store capabilities atomically and preserve tags, any time you dereference them they work just like regular memory accesses.
If you mean you ask malloc for something you later want to align to a cache line, you ask for something larger increment the pointer to be cache
aligned, in the normal way:
#include <cheriintrin.h>
....
// 64 byte cache lines
ptr = malloc(size+63); // leave extra space for rounding up
offset = ptr & 0x3F;
ptr += (0x40 - offset); // round up to cache line
and then increment the base bound to match the new position of 'ptr' and set the top to be ptr+size:
ptr = cheri_bounds_set(ptr, size);
Theo
Theo Markettos wrote:
The bounds have a certain representation limits, because they're packing 192+ bits of information into a 128 bit space. This boils down to an alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I can't remember what the rounding is at this
size). malloc() should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab allocators which round up the allocation to a certain number of slabs.
So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
The bounds have a certain representation limits, because they're packing >> > 192+ bits of information into a 128 bit space. This boils down to an
alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds
might be 1MiB+64 (or whatever, I can't remember what the rounding is at this
size). malloc() should ensure it doesn't hand out that memory to somebody >> > else; allocators typically do this anyway since they use slab allocators >> > which round up the allocation to a certain number of slabs.
So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??
You create a capability with petabyte-scale bounds. The precision of the >bounds may be limited, which means that you can't ram something else right
up against the end or beginning of the array if they aren't sufficiently >aligned. This is in practice not a problem: slab allocators will round up >your address before they allocate the next thing, and most OSes won't >populate the rounded up space with pages anyway.
When you take a pointer to an array element, then it has megabyte scale >bounds and they can be represented with more precision. If your struct >elements are of an arbitrary size and packed together at the byte level then >you either have to live with the bounds giving rights to slightly more than
a single struct element, or you decide that is unacceptable and pad the >struct size up to the next representable size (just like regular non-packed >structs enforce certain alignment), and pay a small memory overhead for
that (<0.25%). That's a security decision you can make one way or another.
Theo
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
The bounds have a certain representation limits, because they're
packing 192+ bits of information into a 128 bit space. This
boils down to an alignment granularity: eg if you allocate a
(1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
can't remember what the rounding is at this size). malloc()
should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab
allocators which round up the allocation to a certain number of
slabs.
So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??
You create a capability with petabyte-scale bounds. The precision of
the bounds may be limited, which means that you can't ram something
else right up against the end or beginning of the array if they
aren't sufficiently aligned. This is in practice not a problem: slab allocators will round up your address before they allocate the next
thing, and most OSes won't populate the rounded up space with pages
anyway.
When you take a pointer to an array element, then it has megabyte
scale bounds and they can be represented with more precision. If
your struct elements are of an arbitrary size and packed together at
the byte level then you either have to live with the bounds giving
rights to slightly more than a single struct element, or you decide
that is unacceptable and pad the struct size up to the next
representable size (just like regular non-packed structs enforce
certain alignment), and pay a small memory overhead for that
(<0.25%). That's a security decision you can make one way or another.
Theo
On 11 Mar 2024 11:10:15 +0000 (GMT)
Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
The bounds have a certain representation limits, because they're packing 192+ bits of information into a 128 bit space. This
boils down to an alignment granularity: eg if you allocate a
(1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
can't remember what the rounding is at this size). malloc()
should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab
allocators which round up the allocation to a certain number of
slabs.
So how to you "encode" a petaByte array ?? of megaByte structs in
a capability ??
You create a capability with petabyte-scale bounds. The precision
of the bounds may be limited, which means that you can't ram
something else right up against the end or beginning of the array
if they aren't sufficiently aligned. This is in practice not a
problem: slab allocators will round up your address before they
allocate the next thing, and most OSes won't populate the rounded
up space with pages anyway.
When you take a pointer to an array element, then it has megabyte
scale bounds and they can be represented with more precision. If
your struct elements are of an arbitrary size and packed together at
the byte level then you either have to live with the bounds giving
rights to slightly more than a single struct element, or you decide
that is unacceptable and pad the struct size up to the next
representable size (just like regular non-packed structs enforce
certain alignment), and pay a small memory overhead for that
(<0.25%). That's a security decision you can make one way or
another.
Theo
Your time stamp (most likely +0000 part) confuses my Claws
Mail newsreader. I wonder if others see similar problem.
On 11 Mar 2024 11:10:15 +0000 (GMT)
Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
The bounds have a certain representation limits, because they're
packing 192+ bits of information into a 128 bit space. This
boils down to an alignment granularity: eg if you allocate a
(1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
can't remember what the rounding is at this size). malloc()
should ensure it doesn't hand out that memory to somebody else;
allocators typically do this anyway since they use slab
allocators which round up the allocation to a certain number of
slabs.
So how to you "encode" a petaByte array ?? of megaByte structs in a
capability ??
You create a capability with petabyte-scale bounds. The precision of
the bounds may be limited, which means that you can't ram something
else right up against the end or beginning of the array if they
aren't sufficiently aligned. This is in practice not a problem: slab
allocators will round up your address before they allocate the next
thing, and most OSes won't populate the rounded up space with pages
anyway.
When you take a pointer to an array element, then it has megabyte
scale bounds and they can be represented with more precision. If
your struct elements are of an arbitrary size and packed together at
the byte level then you either have to live with the bounds giving
rights to slightly more than a single struct element, or you decide
that is unacceptable and pad the struct size up to the next
representable size (just like regular non-packed structs enforce
certain alignment), and pay a small memory overhead for that
(<0.25%). That's a security decision you can make one way or another.
Theo
Your time stamp (most likely +0000 part) confuses my Claws
Mail newsreader. I wonder if others see similar problem.
Michael S <already5chosen@yahoo.com> writes:
On 11 Mar 2024 11:10:15 +0000 (GMT)
Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
Theo Markettos wrote:
The bounds have a certain representation limits, because
they're packing 192+ bits of information into a 128 bit space.
This boils down to an alignment granularity: eg if you
allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64
(or whatever, I can't remember what the rounding is at this
size). malloc() should ensure it doesn't hand out that memory
to somebody else; allocators typically do this anyway since
they use slab allocators which round up the allocation to a
certain number of slabs.
So how to you "encode" a petaByte array ?? of megaByte structs
in a capability ??
You create a capability with petabyte-scale bounds. The precision
of the bounds may be limited, which means that you can't ram
something else right up against the end or beginning of the array
if they aren't sufficiently aligned. This is in practice not a
problem: slab allocators will round up your address before they
allocate the next thing, and most OSes won't populate the rounded
up space with pages anyway.
When you take a pointer to an array element, then it has megabyte
scale bounds and they can be represented with more precision. If
your struct elements are of an arbitrary size and packed together
at the byte level then you either have to live with the bounds
giving rights to slightly more than a single struct element, or
you decide that is unacceptable and pad the struct size up to the
next representable size (just like regular non-packed structs
enforce certain alignment), and pay a small memory overhead for
that (<0.25%). That's a security decision you can make one way or
another.
Theo
Your time stamp (most likely +0000 part) confuses my Claws
Mail newsreader. I wonder if others see similar problem.
xrn on linux is not confused (which is not surprising since
linux stores time internally as GMT anyway).
Date: 11 Mar 2024 11:10:15 +0000 (GMT)
<snip>
Though, partly reverting the logic for the changes to the bus messaging
also did not fix the issue. Behavior is otherwise "rather weird".
So, bug hunt is being annoying.
This is annoying...
Can capabilities be applied to address ranges?
Segmentation similar to the PowerPC 32-bit segmentation is being used in
the current project. Where the upper address bits select a segment
register which provides more address bits. I would like to use the same descriptors for capabilities and the address range segmentation.
On 2024-03-13 11:53 a.m., MitchAlsup1 wrote:
Robert Finch wrote:
Can capabilities be applied to address ranges?
That is a major thing that they provide.
Segmentation similar to the PowerPC 32-bit segmentation is being used
in the current project. Where the upper address bits select a segment
register which provides more address bits. I would like to use the
same descriptors for capabilities and the address range segmentation.
How would you handle 2 billion Capabilities in a single application ??
Each of which have a range of 2 GB each ??? and each containing at
least 1 M Capabilities ????
I should have been a bit more clear maybe, it has taken time to gel in
my head.
PowerPC-32 has only 16 segment registers. I think these could be
extended to capabilities registers in the same manner as proposed for
the FS, GS registers in x64. I wonder if there is any value in doing so though, since the address is a constant. I think it should already be
known if it would exceed a bounds. The segment registers simply tack on 24-bits to the left side of the remaining address bits to generate a
52-bit virtual address. I think all a capability would do is provide a slightly different means to calculate the address.
I have a couple of cores I can experiment with adding capabilities. For
my current project there are 32 segment registers.
******
I am wondering if the ‘R’ term in the CHERI concentrate expansion calc can be less than zero or if it is a modulo value. It is shown as
B[13:11] – 1. I am assuming it can go negative and is not modulo.
How “open” is CHERI ? Can CHERI based code be posted?
I got the impression that with capabilities processor modes may not be necessary. I think the distinction between hypervisor / supervisor may
be lost. Not sure that is a good idea.
Hypervisors are absolutely necessary if you want high RAS where a
GuestOS may crash without taking the system down.
In the past, capability machines wanted to use capabilities for all
relocation and all protection. As long as this is the case, an applica-
tion has an unbounded need for capabilities.
It seems like it would have a lot of overhead, but it might be worth it
for security.
You can grant this with limited capabilities (top 4-odd bits) only when
you have a means to load a new capability into a known <capability> base
register[i]. Since this is privileged data, either the specified function- >> ality of this instruction is precisely specified and operates with
access to GuestOS address space.....it is difficult to imagine how to
add Hyper-
Vision on top of GuestOS supervision.
{{Or do you intend to void Hypervisors?}}
I got the impression that with capabilities processor modes may not be necessary. I think the distinction between hypervisor / supervisor may
be lost. Not sure that is a good idea.
On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:
Hypervisors are absolutely necessary if you want high RAS where a
GuestOS may crash without taking the system down.
Not really. Remember, the whole point about introducing memory protection >into multitasking, multiuser OSes in the first place was precisely so that >one program could crash without taking the system down.
On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:
Hypervisors are absolutely necessary if you want high RAS where a
GuestOS may crash without taking the system down.
Not really. Remember, the whole point about introducing memory protection into multitasking, multiuser OSes in the first place was precisely so that one program could crash without taking the system down.
What happens to the non-HyperVised system when GuestOS goes down ??
The architectural features supporting virtualization are designed to
isolate guests from both the hypervisor and other guests.
I am wondering if the ‘R’ term in the CHERI concentrate expansion calc can be less than zero or if it is a modulo value. It is shown as
B[13:11] – 1. I am assuming it can go negative and is not modulo.
How “open” is CHERI ? Can CHERI based code be posted?
Presumably, in addition to the code, one needs some way for the code to
be able to access its own ".data" and ".bss" sections when called.
Some options:
PC-relative:
Unclear if valid in this case.
GOT:
Table of pointers to things, loaded somehow.
One example here being the ELF FDPIC ABI.
Reloading a Global Pointer via a lookup table accessed via itself.
This is what my ABI uses...
I couldn't seem to find any technical descriptions of the CHERI/Morello
ABI. I had made a guess that it might work similar to FDPIC, as this
could be implemented without needing to use raw addresses (and seemed
like a "best fit").
BGB <cr88192@gmail.com> wrote:
Presumably, in addition to the code, one needs some way for the code to
be able to access its own ".data" and ".bss" sections when called.
AIUI you derive a capability from PCC (the PC capability) that gives you access to your local 'captable', which then holds pointers to your other objects. The captable can be read-only but the capabilities inside it can
be writable (ie pointers allow you to write to your globals etc).
Some options:
PC-relative:
Unclear if valid in this case.
GOT:
Table of pointers to things, loaded somehow.
One example here being the ELF FDPIC ABI.
Reloading a Global Pointer via a lookup table accessed via itself.
This is what my ABI uses...
I couldn't seem to find any technical descriptions of the CHERI/Morello
ABI. I had made a guess that it might work similar to FDPIC, as this
could be implemented without needing to use raw addresses (and seemed
like a "best fit").
This is a description of the linkage model for CHERI MIPS; I'm not aware of anything having changed significantly for RISC-V or Morello, although exact usage of registers etc will be different.
https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/20190113-cheri-linkage.pdf
This also describes the OS-facing ABI on CheriBSD: https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201904-asplos-cheriabi.pdf
Theo
A capability effectively encodes 3 addresses:
An upper bound, lower bound, and a target address.
A segment descriptor generally only needs two:
A base address, and a size.
I am guessing Bounds-Check-Enforce is more likely to have around a 30%
overhead, maybe more or less. But, this is likely also to be for code
that is potentially hostile. But, then, one wants the security to be
strong enough that there is no practical way for code to break out of
the sandbox; though, if allowing for arbitrary machine code, then there
is still the great potential Achilles heel that is the Global Pointer or
GOT.
Only sure way to avoid this is to not have any "potentially compromising" capabilities anywhere
and the main obvious way to do this is via the use of system call.
If operating solely at the C level, it is a little easier: One needs to
make sure that there is no way for the code to get direct access to the Global Pointer or GOT or similar. An ABI based on FDPIC would be bad
here, since it is within the reach of C code (under typical C behavior,
UB notwithstanding) to be able to gain access to the GOT for an
arbitrary function pointer.
A big chunk of this would be overhead shared with the 128-bit ABI (which would have gone over entirely to 128-bit bounds-checked pointers), with
a few new/additional overheads.
On Thu, 14 Mar 2024 00:27:38 +0000, MitchAlsup1 wrote:
What happens to the non-HyperVised system when GuestOS goes down ??
Nothing. That’s what makes it a “guest”.
On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:
The architectural features supporting virtualization are designed to
isolate guests from both the hypervisor and other guests.
Providing an entire separate kernel for each VM is often unnecessary. If
you need separation at the level of entire subsystems, as opposed to individual processes, then that’s what containers are for.
On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:
Hypervisors are absolutely necessary if you want high RAS where a
GuestOS may crash without taking the system down.
Not really. Remember, the whole point about introducing memory protection into multitasking, multiuser OSes in the first place was precisely so that one program could crash without taking the system down.
There is no-one to take over.......and deal with the GuestOS
crash.......
Lawrence D'Oliveiro wrote:
On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:
The architectural features supporting virtualization are designed to
isolate guests from both the hypervisor and other guests.
Providing an entire separate kernel for each VM is often unnecessary.
If you need separation at the level of entire subsystems, as opposed to
individual processes, then that’s what containers are for.
If you are running k Linuxes under a single HyperVisor, you should be
able to share all the Linux code after giving each of them their own VaS
for data.
Lawrence D'Oliveiro wrote:
On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:
The architectural features supporting virtualization are designed to
isolate guests from both the hypervisor and other guests.
Providing an entire separate kernel for each VM is often unnecessary. If
you need separation at the level of entire subsystems, as opposed to
individual processes, then that’s what containers are for.
If you are running k Linuxes under a single HyperVisor, you should be able
to share all the Linux code after giving each of them their own VaS for data.
On 3/14/2024 3:47 PM, Lawrence D'Oliveiro wrote:
On Thu, 14 Mar 2024 22:11:41 +0000, MitchAlsup1 wrote:
There is no-one to take over.......and deal with the GuestOS
crash.......
You can have a management process in the host that watches for these
sorts of events, easily enough.
Watchdog, tick tick... ;^)
On Thu, 14 Mar 2024 17:21:29 -0700, Chris M. Thomasson wrote:
On 3/14/2024 3:47 PM, Lawrence D'Oliveiro wrote:
On Thu, 14 Mar 2024 22:11:41 +0000, MitchAlsup1 wrote:
There is no-one to take over.......and deal with the GuestOS
crash.......
You can have a management process in the host that watches for these
sorts of events, easily enough.
Watchdog, tick tick... ;^)
Event-driven would be more efficient and more responsive than periodic >polling.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
Event-driven would be more efficient and more responsive than periodic >>polling.
That assumes that an event can be generated, which may not be possible
with a guest os crash (if, for example, it was in an infinite loop with interrupts disabled).
Fontconfig's serialization code heavily relied on being able
to create pointers from arbitrary pointer arithmetic, and this
is not compatible with CHERI.
Should be - it's mostly making things play by the rules. Once they
play by the rules then it means they will work the same (or less
buggily) on a regular C platform.
The above link describes the changes - a number being replacing
'long' with intptr_t, some undefined behaviour, bad use of
realloc().
Some of it was modernisation of old codebases (eg add C11 atomics),
Do you want to compartmentalise that shared library, ie put in trust boundaries between the library and its caller?
If you just want to run the shared library as-is, you can recompile
it and get bounds checking etc.
I believe Morello Linux is able to support console-mode apps - ie
it has support context switching and use of capabilities in userspace,
with some support in glibc. I believe there is now a dynamic linker,
but not sure of the status.
mitchalsup@aol.com (MitchAlsup1) writes:
Lawrence D'Oliveiro wrote:
On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:
The architectural features supporting virtualization are designed to
isolate guests from both the hypervisor and other guests.
Providing an entire separate kernel for each VM is often unnecessary. If >>> you need separation at the level of entire subsystems, as opposed to
individual processes, then that’s what containers are for.
If you are running k Linuxes under a single HyperVisor, you should be able >>to share all the Linux code after giving each of them their own VaS for data.
Bad idea. Single point of failure. Impossible to update one without updating all. Linux does update code dynamically when loading and
unloading kernel modules.
Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
If you are running k Linuxes under a single HyperVisor, you should be able >>>to share all the Linux code after giving each of them their own VaS for data.
Bad idea. Single point of failure. Impossible to update one without
updating all. Linux does update code dynamically when loading and
unloading kernel modules.
I actually have a 4-level system::
HyperVisor is the only layer that is not
allowed to crash (RISC-V calls this machine).
Progressing towards less privilege
is GuestHV, GuestOS, and Application. Hypervisor provides only memory, timing, >and device identification services.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 81:50:29 |
Calls: | 6,916 |
Calls today: | 1 |
Files: | 12,382 |
Messages: | 5,433,165 |