• Capabilities, Anybody?

    From Lawrence D'Oliveiro@21:1/5 to All on Fri Mar 8 22:38:11 2024
    “Capabilities” are an old idea for doing memory protection by storing the access rights in unforgeable descriptors that are given to authorized processes. This way, there is no need for the traditional unprivileged- versus-privileged-processor-mode concept; process A can have privileged
    access to memory region X but not Y, while process B can have privileged
    access to memory region Y but not X, so neither is “more” privileged than the other: each one is trusted with just a limited set of privileged
    functions.

    The idea fell out of use because of performance issues. But in these more security-conscious times, the overhead seems more and more like a
    reasonable price to pay for the greater control it offers. There is a
    project called CHERI, whose concepts have been implemented in Arm’s “Morello” chip.

    <https://www.theregister.com/2022/07/26/cheri_computer_runs_kde/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to All on Sat Mar 9 01:51:10 2024
    I have been in CPU design for a very long time. I did a HS level
    design (calculator) in 1968 3 years before the Bomar Brain, did
    a #60 design in college as a Jr, and started doing professional
    designs (Mc 88100) in 1983.

    With all this background and long term in this career, I can say
    without a trace of doubt, that I am not <yet> smart enough to do
    a capabilities ISA/system and get it out the door without errors.

    On the other hand, My 66000 Architecture is immune to most attack
    strategies now in vogue:: Return Oriented Programming, RowHammer,
    Spectré, GOT overwrites, Buffer Overflows,... All without having
    any semblance of capabilities; and all without any performance
    degradations other than typical cache and TLB effects.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Robert Finch on Sat Mar 9 02:25:48 2024
    On Fri, 8 Mar 2024 21:15:29 -0500, Robert Finch wrote:

    I gather that capabilities are generally fine-grained, and capability pointers would be generated and handed out by the OS. What happens when
    a pointer is incremented?

    Each capability is a descriptor, describing a range of memory, not (necessarily) just one address. So it is valid to use that to address any
    area within the range.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to D'Oliveiro on Sat Mar 9 10:29:00 2024
    In article <usg40i$1udfo$3@dont-email.me>, ldo@nz.invalid (Lawrence
    D'Oliveiro) wrote:

    There is a project called CHERI, whose concepts have been implemented
    in Arm's _Morello_ chip.

    I've followed this, a bit, and was offered a Morello development board
    last year. I concluded that my employers had too much commercially
    important work underway for me to spend six months tinkering with a
    prototype port to something that may not pan out. We'd also have to
    return the hardware and thus couldn't maintain the port if it succeeded.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Sat Mar 9 15:09:46 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    I have been in CPU design for a very long time. I did a HS level
    design (calculator) in 1968 3 years before the Bomar Brain, did
    a #60 design in college as a Jr, and started doing professional
    designs (Mc 88100) in 1983.

    With all this background and long term in this career, I can say
    without a trace of doubt, that I am not <yet> smart enough to do
    a capabilities ISA/system and get it out the door without errors.

    On the other hand, My 66000 Architecture is immune to most attack
    strategies now in vogue:: Return Oriented Programming, RowHammer,
    Spectré, GOT overwrites, Buffer Overflows,... All without having
    any semblance of capabilities; and all without any performance
    degradations other than typical cache and TLB effects.

    On the gripping hand, the Burroughs Large systems capability based
    design is still processing data almost sixty years after the original
    B5500 was introduced.

    There CHERI designs on silicon in existence

    https://www.morello-project.org/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Sat Mar 9 20:33:49 2024
    On Sat, 09 Mar 2024 15:09:46 GMT, Scott Lurndal wrote:

    ... the Burroughs Large systems capability based design ...

    As I recall, that depended on not giving users access to a compiler that
    could generate instructions that bypassed the protection system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Robert Finch on Sat Mar 9 20:34:52 2024
    On Sat, 9 Mar 2024 14:58:24 -0500, Robert Finch wrote:

    Capabilities sounds like something previously implemented in mainframe
    class computers.

    IBM’s System/38 and follow-on AS/400 (both long obsolete) may have had something like them. Not sure if they count as “mainframe-class”.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to D'Oliveiro on Sat Mar 9 21:16:00 2024
    In article <usih5b$2gl39$2@dont-email.me>, ldo@nz.invalid (Lawrence
    D'Oliveiro) wrote:

    IBM_s System/38 and follow-on AS/400 (both long obsolete) may have
    had something like them. Not sure if they count as
    _mainframe-class_.

    This line is still going, IBM i is the latest version. It is a capability system, but the capabilities are implemented in software rather than
    hardware. This works because the available interfaces are fairly abstract
    and low-level access is simply not allowed.

    Nowadays, it runs on IBM POWER hardware, the same as is used for AIX. It
    isn't very fast, but it's intended for markets where the IBM branding is important, and basic data processing, rather than anything challenging
    for modern hardware, is all that's needed.

    It is a fairly large business segment for IBM, but tends to be a bit
    segregated from other fields of computing, because its terminology is
    weird, and it's so firmly business-orientated.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sun Mar 10 00:02:32 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 09 Mar 2024 15:09:46 GMT, Scott Lurndal wrote:

    ... the Burroughs Large systems capability based design ...

    As I recall, that depended on not giving users access to a compiler that >could generate instructions that bypassed the protection system.

    Yes. Which isn't a problem (once they fixed the bug in the early
    1970's that allowed loading a compiler from a library tape). Fixed
    by removing the compiler privilege when loading from tape - the
    operator/system administrator would need issue a privileged
    command to mark the executable as a compiler.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB on Sat Mar 9 23:59:43 2024
    BGB <cr88192@gmail.com> writes:
    On 3/9/2024 9:09 AM, Scott Lurndal wrote:


    There [are] CHERI designs on silicon in existence

    https://www.morello-project.org/

    It is doable, at least.

    Main open question is if they can deliver enough on their claims in a
    way that justifies the cost of the memory tagging (eg, where one needs
    to tag whether or not each memory location holds a valid capability).


    As I see it, "locking things down" would likely require turning things
    like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
    (and generally giving the kernel a much more active role in this process).

    All of these have been addressed in CHERI.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Sun Mar 10 00:04:12 2024
    BGB wrote:

    On 3/9/2024 1:58 PM, Robert Finch wrote:
    On 2024-03-09 1:56 p.m., BGB wrote:
    On 3/9/2024 9:09 AM, Scott Lurndal wrote:
    mitchalsup@aol.com (MitchAlsup1) writes:
    <snip>

    For Femtiki OS, I have a single object describing an array of values.
    For instance messages which are small objects, are described with a
    single object for an array of messages. It is too costly to use an
    object descriptor for each message.

    For a CHERI like approach, one would need a tag of 1 bit for every 16
    bytes of RAM (to flag whether or not that RAM represents a valid
    capability).

    For the combination of RAM sizes and FPGAs I have access to, this is non-viable, as I would need more BRAM for the tag memory than exists in
    the FPGAs.

    Yes, indeed, not viable. Now imagine a page of those, and now you have
    to write out 4096 bytes and 2048-tag-bits onto a disk with standard sectors......

    In effect, this will mean needing another smaller cache which is bolted
    onto the L2 cache or similar, whose sole purpose is to provide tag-bits
    (and probably bounce requests to some other area of RAM which contains
    the tag-bits memory).

    Denelcor HEP had tag-like-bits and all the crud they bring (but they were
    used as locks instead of tags).


    As I see it, "locking things down" would likely require turning things
    like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
    (and generally giving the kernel a much more active role in this
    process).

    I think this may not be necessary, but I have to read some more. The
    capabilities have transfer rules which might make it possible to use
    existing code. They have ported things over to Riscv. It cannot be too
    mountainous a task.


    You can make it work, yes, but the question is less "can you make it
    work, technically", but more:
    Can you make it work in a way that provides both a fairly normal C experience, and *also* an unbreakable sandbox, at the same time.

    And here the answer is essentially <wait for it> no.

    My skepticism here is that, short of drastic measures like moving malloc
    and libdl and similar into kernel space, it may not be possible to keep
    the sandbox secure using solely capabilities.

    ASLR could help, but using ASLR to maintain an image of integrity for
    the capability system would be "kinda weak".

    How do you ALSR code when a latent capability on disk still points at
    its defined memory area ? Yes, you can ALSR at boot, but you can use
    the file system to hold capabilities {which is something most capability systems desire and promote.}

    One could ask though:
    How is my security model (with Keyrings) any different?

    Well, the partial answer mostly is that a call that switches keyrings is effectively accomplished via context switches (with the two keyrings effectively running in separate threads).

    So, like, even if the untrusted thread has a pointer to the protected thread's memory, it can't access it...

    Though, a similar model could potentially be accomplished with
    conventional page-tables, by making pseudo-processes which only share
    parts of their address space with another process (and the protected
    memory is located in the non-shared spaces, with any calls between them
    via an RPC mechanism).

    Capability manipulation via messages.

    Had considered mechanisms which could pull this off without a context
    switch, but most would fall short of "acceptably secure" (if a path
    exists where a task could modify its own KRR or similar, this mechanism
    is blown).


    My bounds-checking scheme also worked, but with a caveat:
    It only works if code does not get "overly clever" with the use of pointers.

    Which no-one can trust of C programs.

    So, it worked well enough to where I was able to enable full
    bounds-checking in Doom and similar, but was not entirely transparent to
    some of the runtime code. If you cast between pointers and integers, and manipulate the pointer bits, there are "gotchas".

    Gee, if only we had trained programmers to avoid some of the things we
    are now requiring new languages to prevent.....

    A full capability system is going to have a similar restriction.

    Understatement of the year candidate !

    Either pointer<->integer casting would need to be disallowed, or (more likely), turned into a runtime call which can "bless" the address before returning it as a capability, which would exist as another potential
    attack surface (unless, of course, this mechanism is itself turned into
    a system call).


    OTOH:
    If one can't implement something like a conventional JavaScript VM, or
    if it takes a significant performance hit, this would not be ideal.

    Going for 2 in one post !!

    Or, change the description, as being mostly a tool to eliminate things
    like buffer overflow exploits and memory corruption, and as a fairly
    powerful debugging feature.

    But, say, note that it would not be sufficient, say, for things like
    sandboxing hostile code within a shared address space with another
    program that needs to be kept protected.


    Granted, the strength could likely be improved (in the face of trying
    to prevent hostile code from being able to steal capabilities) through
    creative use of ASLR. Along with ABI features, such as "scratch
    register scrubbing" (say, loading zeroes into scratch registers on
    function return, such as to prevent capabilities from being leaked
    through via registers), marking function pointers as "Execute Only" etc. >>>
    As noted, a capability system would likely still be pretty strong
    against things like buffer overflows (but if only being used to
    mitigate buffer overflows, is a bit overkill; so the main
    "interesting" case is if it can be used to make an "unbreakable
    sandbox" for potentially hostile machine code).


    *: If it is possible to perform a Load or (worse, Capability Load)
    through a function pointer, this is likely to be a significant attack
    vector. Need to make it so that function pointers can only be used to
    call things. Protecting against normal data loads would be needed
    mostly to try to prevent code from being able to gain access to a
    known pointer and possibly side-step the ASLR (say, if it can figure
    out that the address it wants to access is reachable from a capability
    that the code has access to).



    Though, on my side of things, it is possible I could revive a modified
    form of the 128-bit ABI, while dropping the VAS back down to 48 bits,
    and turn it into a more CHERI-like form (with explicit upper and lower
    bounds and access-enable flags, rather than a shared-exponent size and
    bias scheme).

    Yeah, IMO explicit upper and lower bounds would be better even though it
    uses more memory. The whole manipulation of the bounds is complex. I
    sketched out using a 256b capability descriptor. Some of the bits can be
    trimmed from the bounds if things are page aligned.


    IIRC, they were using 128-bit descriptors with a bit-slicing scheme.


    So, say, if I were to do similar (within my existing pointer layout):
    ( 27: 0): Base Address
    ( 47: 28): Shared Address (47:28)
    ( 63: 48): Type Tag Bits
    ( 87: 64): Lower Bound (27:4)
    (111: 88): Upper Bound (27:4)
    ( 112): Base Adjust
    ( 113): Lower Bound Adjust
    ( 114): Upper Bound Adjust
    (127:115): Access Flags / Etc

    Why is lower bound NOT 0 ?
    How can you assume/work-with a base address smaller than 48-bits ??
    How can the bounds not be at least 47-bits in size ??

    Though, this particular encoding would limit bounds-checking to a 256MB region, which is lame (or eat more tag bits, and have slightly bigger regions).

    It is worse than lame.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to BGB on Sun Mar 10 15:42:11 2024
    BGB <cr88192@gmail.com> wrote:
    There CHERI designs on silicon in existence

    https://www.morello-project.org/

    Member of the CHERI team here...

    Main open question is if they can deliver enough on their claims in a
    way that justifies the cost of the memory tagging (eg, where one needs
    to tag whether or not each memory location holds a valid capability).

    Crudely speaking, it's 1/128 of memory so 0.78%. Through use of
    a tag cache you don't need to have a speciifc 129th bit on your DRAM
    (although that's an option, like ECC) but can just wall off a piece of
    regular DRAM and use it for a tag table that's accessed via the tag cache.
    You only pay the cost for DRAM you actually have (eg one tag table per DIMM, sized for that DIMM).

    The tag cache can also be compressed[1] so that if a memory page has no capabilities you don't need to store tags for it. This reduces tag cache bandwidth further.

    As I see it, "locking things down" would likely require turning things
    like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls
    (and generally giving the kernel a much more active role in this process).

    You don't need memory allocations to be system calls, because capabilities
    can be manipulated in userspace (which is a key design goal). ie if
    malloc() possesses a capability to 1MB of memory and a client requests 100 bytes, it can take its 1MB capability, offset the base to point to the start
    of the allocation, set the offset to zero, and shrink the top to be
    base+100, and then return that 100-byte capability.

    In a Unix OS, userspace malloc() does occasionally need to make a syscall - when it runs out of memory pages on hand it has to ask the kernel to
    allocate it some more via mmap(). But this is orthogonal to use of capabilities (which are virtually addressed in such a system).

    For dynamic linking, you need something that holds capabilities that allow
    you to turn read-write memory into read-execute memory. That can be part of the dynamic linker or part of the OS. Since you are typically making
    syscalls to read the shared library from disc anyway, as well as marking RW pages as RX pages in the page table, the OS is already involved.

    But, say, note that it would not be sufficient, say, for things like sandboxing hostile code within a shared address space with another
    program that needs to be kept protected.


    Granted, the strength could likely be improved (in the face of trying to prevent hostile code from being able to steal capabilities) through
    creative use of ASLR. Along with ABI features, such as "scratch register scrubbing" (say, loading zeroes into scratch registers on function
    return, such as to prevent capabilities from being leaked through via registers), marking function pointers as "Execute Only" etc.

    As noted, a capability system would likely still be pretty strong
    against things like buffer overflows (but if only being used to mitigate buffer overflows, is a bit overkill; so the main "interesting" case is
    if it can be used to make an "unbreakable sandbox" for potentially
    hostile machine code).

    Code capabilities prevent a lot of control flow attacks, because you can
    only execute code you have capabilities to. For example you're in a
    function - you possess a return address capability (which has bounds based
    on your local environment or compartment you're in) so you can manipulate
    the address of that return address, but you can't jump to arbitrary code and you can't forge return addresses. So no stack smashing, no ROP/JOP attacks, etc.

    Through setting the bounds on a code capability you can sandbox small pieces
    of code to a function granularity, which is more efficient than typical MMU sandboxing.

    *: If it is possible to perform a Load or (worse, Capability Load)
    through a function pointer, this is likely to be a significant attack
    vector. Need to make it so that function pointers can only be used to
    call things. Protecting against normal data loads would be needed mostly
    to try to prevent code from being able to gain access to a known pointer
    and possibly side-step the ASLR (say, if it can figure out that the
    address it wants to access is reachable from a capability that the code
    has access to).

    You can't arbitrarily change data to code capabilities - they are different types.

    It is likely that the capability memory tagging would need to be managed
    by the L2 cache. Would need some mechanism for the tag-bits memory (say,
    2MB for 256MB at 1b per 16B line). Would also need to somehow work this
    flag bit into the ringbus messaging.

    AXI has user fields which can be used for sending capabilities across the interconnect, through third party AXI IPs like muxes, arbiters, etc.

    Theo

    [1] https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201711-iccd2017-efficient-tags.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Theo Markettos on Sun Mar 10 17:14:00 2024
    In article <Qry*XE3Ez@news.chiark.greenend.org.uk>, theom+news@chiark.greenend.org.uk (Theo Markettos) wrote:

    The C experience is fairly normal, as long as you are actually
    playing by the C rules. You can't arbitraily cast integers to
    pointers - if you plan to do that you need to use intptr_t so
    the compiler knows to keep the data in a capability so it can
    use it as a pointer later.

    Makes sense, though it will require updating of older code for the rules
    being more thoroughly enforced. Not a bad thing.

    Tricks which store data in the upper or lower bits of pointers are
    awkward.

    Not compatible with Aarch64 Pointer Authentication, but CHERI should be a functional replacement anyway.

    Changes in a 6M LoC KDE desktop codebase were 0.026% of lines: https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace 466297f20a0dbd22d371.pdf

    15,000 or so changes. Quite a lot. Is the code backwards-compatible to a conventional C platform?

    Sandboxing involves dividing code into compartments; that involves
    some decision making as to where you draw the security boundaries.
    There aren't good tools to do that (they are being worked on).
    CHERI offers you the tools to implement whatever compartmentalisation
    stategy you wish, but it's not quite as simple as just recompiling.

    I have a slightly odd case: the software I work on ships as a great big
    shared library that's used in-process by its caller. It isn't any kind of server, and doesn't use any IPC; in concept it's a huge math library that
    asks the caller to allocate memory for it. So it needs to share a heap
    with the caller. Presumably that model is workable?

    ... we're running on FreeBSD

    That was a point against my experimenting with Morello when we were
    offered it last year; the requirement to port to FreeBSD first. Morello
    Linux seems insufficiently mature at present; do you have any idea of the timescale for it to be robustly usable for porting application code by
    someone who isn't experienced in Linux internals?

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to mitchalsup@aol.com on Sun Mar 10 16:24:38 2024
    MitchAlsup1 <mitchalsup@aol.com> wrote:
    BGB wrote:

    On 3/9/2024 1:58 PM, Robert Finch wrote:
    On 2024-03-09 1:56 p.m., BGB wrote:
    On 3/9/2024 9:09 AM, Scott Lurndal wrote:
    mitchalsup@aol.com (MitchAlsup1) writes:
    <snip>

    For Femtiki OS, I have a single object describing an array of values.
    For instance messages which are small objects, are described with a
    single object for an array of messages. It is too costly to use an
    object descriptor for each message.

    For a CHERI like approach, one would need a tag of 1 bit for every 16
    bytes of RAM (to flag whether or not that RAM represents a valid capability).

    For the combination of RAM sizes and FPGAs I have access to, this is non-viable, as I would need more BRAM for the tag memory than exists in
    the FPGAs.

    If you have ECC RAM on your FPGA board you could use the ECC bits for tags. Otherwise a tag cache is another way. The L1s and L2s carry tags (ie 129
    bit datapath), but you just put the tag cache on the front of your DRAM.

    Yes, indeed, not viable. Now imagine a page of those, and now you have
    to write out 4096 bytes and 2048-tag-bits onto a disk with standard sectors......

    Our swapping implementation keeps the tag bits in RAM, while the page is swapped out. Eventually you need to swap out a page of tag bits, but that's much less common.

    In effect, this will mean needing another smaller cache which is bolted onto the L2 cache or similar, whose sole purpose is to provide tag-bits (and probably bounce requests to some other area of RAM which contains
    the tag-bits memory).

    Denelcor HEP had tag-like-bits and all the crud they bring (but they were used as locks instead of tags).


    As I see it, "locking things down" would likely require turning things >>> like "malloc()/free()", "dlopen()/dlsym()/...", etc, into system calls >>> (and generally giving the kernel a much more active role in this
    process).

    I think this may not be necessary, but I have to read some more. The
    capabilities have transfer rules which might make it possible to use
    existing code. They have ported things over to Riscv. It cannot be too
    mountainous a task.


    You can make it work, yes, but the question is less "can you make it
    work, technically", but more:
    Can you make it work in a way that provides both a fairly normal C experience, and *also* an unbreakable sandbox, at the same time.

    The C experience is fairly normal, as long as you are actually playing by
    the C rules. You can't arbitraily cast integers to pointers - if you plan
    to do that you need to use intptr_t so the compiler knows to keep the data
    in a capability so it can use it as a pointer later.

    Tricks which store data in the upper or lower bits of pointers are awkward. Other tricks like XOR linked lists of pointers don't work. This is all
    stuff that's pushing into the 'undefined behaviour' parts of C (even if C doesn't explicitly call it out).

    Changes in a 6M LoC KDE desktop codebase were 0.026% of lines: https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace466297f20a0dbd22d371.pdf

    Depends what you mean by 'unbreakable sandbox': this is compiling code with every pointer being a capability, so every memory access is bounds checked.

    Sandboxing involves dividing code into compartments; that involves some decision making as to where you draw the security boundaries. There aren't good tools to do that (they are being worked on). CHERI offers you the
    tools to implement whatever compartmentalisation stategy you wish, but it's
    not quite as simple as just recompiling.

    And here the answer is essentially <wait for it> no.

    My skepticism here is that, short of drastic measures like moving malloc and libdl and similar into kernel space, it may not be possible to keep
    the sandbox secure using solely capabilities.

    ASLR could help, but using ASLR to maintain an image of integrity for
    the capability system would be "kinda weak".

    How do you ALSR code when a latent capability on disk still points at
    its defined memory area ? Yes, you can ALSR at boot, but you can use
    the file system to hold capabilities {which is something most capability systems desire and promote.}

    Why would you want to ASLR? ASLR is to prevent you guessing valid addresses for things so you can't craft pointers to them. CHERI prevents you crafting pointers to arbitrary things in the first place.

    One could ask though:
    How is my security model (with Keyrings) any different?

    Well, the partial answer mostly is that a call that switches keyrings is effectively accomplished via context switches (with the two keyrings effectively running in separate threads).

    So, like, even if the untrusted thread has a pointer to the protected thread's memory, it can't access it...

    Though, a similar model could potentially be accomplished with
    conventional page-tables, by making pseudo-processes which only share
    parts of their address space with another process (and the protected
    memory is located in the non-shared spaces, with any calls between them
    via an RPC mechanism).

    Capability manipulation via messages.

    That's the microkernel setup: the software running in system mode holds
    the privilege to alter access control (via page tables), so any time you
    want to change that you have to ask the system (microkernel or whatever) to do so. That's slow, in particular TLB manipulation (invalidation and
    shootdowns). CHERI allows you to manipulate them in userspace without
    having to call out to the kernel. Additionally it is finer grained than page granularity.

    Some experimental OSes have done things with manipulating page tables from userspace processes which avoids syscall overhead but not TLB costs - and it probably depends on the architecture whether you can do TLB invalidations
    from userspace.

    Had considered mechanisms which could pull this off without a context switch, but most would fall short of "acceptably secure" (if a path
    exists where a task could modify its own KRR or similar, this mechanism
    is blown).


    My bounds-checking scheme also worked, but with a caveat:
    It only works if code does not get "overly clever" with the use of pointers.

    Which no-one can trust of C programs.

    A lot of modern software is well behaved (see figure above). Particular software like JIT compilers can be more awkward - ideally you would really want the JIT compiler to emit capability-aware code. You can still run generated aarch64/rv64 non-capability code, but without the benefit of capability
    checks.

    So, it worked well enough to where I was able to enable full bounds-checking in Doom and similar, but was not entirely transparent to some of the runtime code. If you cast between pointers and integers, and manipulate the pointer bits, there are "gotchas".

    That's the kind of thing that fall down: software being 'clever', where it doesn't need to be. I get the sense Doom's primary purpose in life was
    being 'clever' in order to be fast on a 386.

    Gee, if only we had trained programmers to avoid some of the things we
    are now requiring new languages to prevent.....

    If only we could rewrite all the software out there in memory-safe
    languages... then we'd have twice as much software (and more bugs).

    Either pointer<->integer casting would need to be disallowed, or (more likely), turned into a runtime call which can "bless" the address before returning it as a capability, which would exist as another potential
    attack surface (unless, of course, this mechanism is itself turned into
    a system call).


    OTOH:
    If one can't implement something like a conventional JavaScript VM, or
    if it takes a significant performance hit, this would not be ideal.

    Going for 2 in one post !!

    We've had the DukTape Javascript interpreter working for CHERI for a while. Work is under way to port Chromium and V8 - that's a much bigger project,
    just because Chromium is a huge piece of software (and we're running on FreeBSD, which is not a platform that Chrome supports building for). The
    work in V8 is to get it to implement the JS object model using CHERI instructions as part of its generated code.

    Though, on my side of things, it is possible I could revive a modified >>> form of the 128-bit ABI, while dropping the VAS back down to 48 bits,
    and turn it into a more CHERI-like form (with explicit upper and lower >>> bounds and access-enable flags, rather than a shared-exponent size and >>> bias scheme).

    Yeah, IMO explicit upper and lower bounds would be better even though it >> uses more memory. The whole manipulation of the bounds is complex. I
    sketched out using a 256b capability descriptor. Some of the bits can be >> trimmed from the bounds if things are page aligned.

    We originally started out with a 256-bit capability with explicit base and
    top - this was to try things out simply so as not to prematurely optimise.
    One early finding was that we needed to support capabilities being out of bounds, as long as they aren't dereferenced out of bounds - software
    sometimes saves a pointer that's before or after the object, before then bringing it back in bounds when dereferencing it.

    This is something the 128-bit compressed capability format supports, which compresses the bounds a bit like floating point. This imposes certain
    limits on bounds granularity, but they haven't been a problem in practice - memory allocators tend to allocate objects in aligned chunks anyway (eg ask
    for a 128MiB block and it'll probably be page aligned). The pointer is always byte aligned.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Theo Markettos on Sun Mar 10 21:23:09 2024
    Theo Markettos wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    BGB wrote:
    <snip>
    You can make it work, yes, but the question is less "can you make it
    work, technically", but more:
    Can you make it work in a way that provides both a fairly normal C
    experience, and *also* an unbreakable sandbox, at the same time.

    The C experience is fairly normal, as long as you are actually playing by
    the C rules. You can't arbitraily cast integers to pointers - if you plan
    to do that you need to use intptr_t so the compiler knows to keep the data
    in a capability so it can use it as a pointer later.

    As a 'for instance' how does one take a capability and align it to a cache
    line boundary ?? Say in/after malloc() ?!?

    Tricks which store data in the upper or lower bits of pointers are awkward.

    Especially so when you have a 64-bit VaS to play in.

    Other tricks like XOR linked lists of pointers don't work.

    This should have died out with the PDP-11s. With modern machines it does not save enough space to warrant the loss in performance.

    This is all
    stuff that's pushing into the 'undefined behaviour' parts of C (even if C doesn't explicitly call it out).

    <snip>

    Why would you want to ASLR? ASLR is to prevent you guessing valid addresses for things so you can't craft pointers to them. CHERI prevents you crafting pointers to arbitrary things in the first place.

    ALSR has become a catch-phrase used to give the listener a good feeling
    about the security of the present system--all the while knowing that is
    it little more than window dressing on a building already in flames.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to John Dallman on Sun Mar 10 22:06:36 2024
    John Dallman <jgd@cix.co.uk> wrote:
    In article <Qry*XE3Ez@news.chiark.greenend.org.uk>, theom+news@chiark.greenend.org.uk (Theo Markettos) wrote:

    The C experience is fairly normal, as long as you are actually
    playing by the C rules. You can't arbitraily cast integers to
    pointers - if you plan to do that you need to use intptr_t so
    the compiler knows to keep the data in a capability so it can
    use it as a pointer later.

    Makes sense, though it will require updating of older code for the rules being more thoroughly enforced. Not a bad thing.

    Indeed.

    Tricks which store data in the upper or lower bits of pointers are
    awkward.

    Not compatible with Aarch64 Pointer Authentication, but CHERI should be a functional replacement anyway.

    I think they can theoretically coexist - ie you can have an authenticated 64 bit pointer which looks like an integer as far as the capability checks are concerned, but if you don't try to dereference it as a capability then
    that's ok. (Morello doesn't implement PA as it's based on a prior microarchitecture)

    Changes in a 6M LoC KDE desktop codebase were 0.026% of lines: https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace 466297f20a0dbd22d371.pdf

    15,000 or so changes. Quite a lot. Is the code backwards-compatible to a conventional C platform?

    Should be - it's mostly making things play by the rules. Once they play by
    the rules then it means they will work the same (or less buggily) on a
    regular C platform.

    The above link describes the changes - a number being replacing 'long' with intptr_t, some undefined behaviour, bad use of realloc(). Some of it was modernisation of old codebases (eg add C11 atomics), other changes were just
    to make optional code that isn't currently available (eg no OpenGL available
    in VNC).

    Sandboxing involves dividing code into compartments; that involves
    some decision making as to where you draw the security boundaries.
    There aren't good tools to do that (they are being worked on).
    CHERI offers you the tools to implement whatever compartmentalisation stategy you wish, but it's not quite as simple as just recompiling.

    I have a slightly odd case: the software I work on ships as a great big shared library that's used in-process by its caller. It isn't any kind of server, and doesn't use any IPC; in concept it's a huge math library that asks the caller to allocate memory for it. So it needs to share a heap
    with the caller. Presumably that model is workable?

    Do you want to compartmentalise that shared library, ie put in trust
    boundaries between the library and its caller?

    If you just want to run the shared library as-is, you can recompile it and
    get bounds checking etc. If you want to have some kind of trust boundary
    (eg the library doesn't trust the app, or the app doesn't trust the library) then you would need to put in a compartment boundary between the two. In
    that case it might make sense for the memory allocator to be its own compartment; the capabilities it hands out should be usable by both app and library.

    ... we're running on FreeBSD

    That was a point against my experimenting with Morello when we were
    offered it last year; the requirement to port to FreeBSD first. Morello
    Linux seems insufficiently mature at present; do you have any idea of the timescale for it to be robustly usable for porting application code by someone who isn't experienced in Linux internals?

    FreeBSD is more advanced, in part because it's had more development effort
    on it over the years and partly since it's less of a moving target (Linux
    has huge amounts of churn). That means more things have capability support
    in the kernel and userspace (including a lot of packages available).

    I believe Morello Linux is able to support console-mode apps - ie it has support context switching and use of capabilities in userspace, with some support in glibc. I believe there is now a dynamic linker, but not sure of
    the status. There is only limited use of capabilities in the kernel, so anything to do with kernel compartmentalisation would be more work. I think someone was working on building Debian packages pure-capability - I haven't heard the current status of that work.

    https://www.morello-project.org/cheri-feature-matrix/
    has a comparison table.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to mitchalsup@aol.com on Sun Mar 10 22:32:46 2024
    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    BGB wrote:
    <snip>
    You can make it work, yes, but the question is less "can you make it
    work, technically", but more:
    Can you make it work in a way that provides both a fairly normal C
    experience, and *also* an unbreakable sandbox, at the same time.

    The C experience is fairly normal, as long as you are actually playing by the C rules. You can't arbitraily cast integers to pointers - if you plan to do that you need to use intptr_t so the compiler knows to keep the data in a capability so it can use it as a pointer later.

    As a 'for instance' how does one take a capability and align it to a cache line boundary ?? Say in/after malloc() ?!?

    I'm not sure what you mean:

    Capabilities are 128-bit fields stored aligned in memory. It's not allowed
    to store a capability that isn't 128-bit aligned. Those naturally align
    with cache lines. Every 128 bits has a tag associated with it, stored
    together or apart (various schemes discussed in my previous posts).

    The memory it points to can be arbitraily aligned. It is just a 64-bit pointer. You dereference it using 8/16/32/64/128 bit load and store instructions in the usual datapath (either explicitly using 'C load'/'C
    store' instructions or switching to a mode where every regular load/store implicitly dereferences a capability rather than integer pointer)

    The bounds have a certain representation limits, because they're packing
    192+ bits of information into a 128 bit space. This boils down to an
    alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I can't remember what the rounding is at this size). malloc() should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab allocators
    which round up the allocation to a certain number of slabs.

    There is a trickiness if somebody wants to generate a capability to a
    subobject in the middle of a large object that isn't aligned: load in a
    4.7GiB DVD wholesale into memory and try to generate a capability to a block
    of frames in the middle of it, which is potentially large and yet the base
    is unaligned, which would cause a loss of bounds precision (somebody could access the frame before or after). It's possible to imagine things like
    that, but we've not seen software actually do it.

    I'm not sure how any of these relate to cache lines? Aside for ensuring the caches store capabilities atomically and preserve tags, any time you dereference them they work just like regular memory accesses.

    If you mean you ask malloc for something you later want to align to a cache line, you ask for something larger increment the pointer to be cache
    aligned, in the normal way:

    #include <cheriintrin.h>
    ...
    // 64 byte cache lines
    ptr = malloc(size+63); // leave extra space for rounding up
    offset = ptr & 0x3F;
    ptr += (0x40 - offset); // round up to cache line

    and then increment the base bound to match the new position of 'ptr' and set the top to be ptr+size:

    ptr = cheri_bounds_set(ptr, size);

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Theo Markettos on Sun Mar 10 22:59:52 2024
    Theo Markettos wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    BGB wrote:
    <snip>
    You can make it work, yes, but the question is less "can you make it
    work, technically", but more:
    Can you make it work in a way that provides both a fairly normal C
    experience, and *also* an unbreakable sandbox, at the same time.

    The C experience is fairly normal, as long as you are actually playing by >> > the C rules. You can't arbitraily cast integers to pointers - if you plan >> > to do that you need to use intptr_t so the compiler knows to keep the data >> > in a capability so it can use it as a pointer later.

    As a 'for instance' how does one take a capability and align it to a cache >> line boundary ?? Say in/after malloc() ?!?

    I'm not sure what you mean:

    Capabilities are 128-bit fields stored aligned in memory. It's not allowed to store a capability that isn't 128-bit aligned. Those naturally align
    with cache lines. Every 128 bits has a tag associated with it, stored together or apart (various schemes discussed in my previous posts).

    The memory it points to can be arbitraily aligned.

    For performance reasons one would want it* cache line aligned.
    (*) or some part of the whole thing aligned to a cache line boundary.

    p = (p_type *)((intpt_t)p & ~63);

    It is just a 64-bit pointer. You dereference it using 8/16/32/64/128 bit load and store instructions in the usual datapath (either explicitly using 'C load'/'C store' instructions or switching to a mode where every regular load/store implicitly dereferences a capability rather than integer pointer)

    The bounds have a certain representation limits, because they're packing
    192+ bits of information into a 128 bit space. This boils down to an alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I can't remember what the rounding is at this size). malloc() should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab allocators which round up the allocation to a certain number of slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??

    There is a trickiness if somebody wants to generate a capability to a subobject in the middle of a large object that isn't aligned: load in a 4.7GiB DVD wholesale into memory and try to generate a capability to a block of frames in the middle of it, which is potentially large and yet the base
    is unaligned, which would cause a loss of bounds precision (somebody could access the frame before or after). It's possible to imagine things like that, but we've not seen software actually do it.

    I'm not sure how any of these relate to cache lines?

    Smaller agglomerations of memory want to be cache-line aligned for performance reasons. If a struct fits in 1 cache line you don't want it positioned so it needs 2 cache lines in use.

    Aside for ensuring the caches store capabilities atomically and preserve tags, any time you dereference them they work just like regular memory accesses.

    If you mean you ask malloc for something you later want to align to a cache line, you ask for something larger increment the pointer to be cache
    aligned, in the normal way:

    #include <cheriintrin.h>
    ....
    // 64 byte cache lines
    ptr = malloc(size+63); // leave extra space for rounding up
    offset = ptr & 0x3F;
    ptr += (0x40 - offset); // round up to cache line

    and then increment the base bound to match the new position of 'ptr' and set the top to be ptr+size:

    ptr = cheri_bounds_set(ptr, size);

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to mitchalsup@aol.com on Mon Mar 11 11:10:15 2024
    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because they're packing 192+ bits of information into a 128 bit space. This boils down to an alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I can't remember what the rounding is at this
    size). malloc() should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab allocators which round up the allocation to a certain number of slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??

    You create a capability with petabyte-scale bounds. The precision of the bounds may be limited, which means that you can't ram something else right
    up against the end or beginning of the array if they aren't sufficiently aligned. This is in practice not a problem: slab allocators will round up
    your address before they allocate the next thing, and most OSes won't
    populate the rounded up space with pages anyway.

    When you take a pointer to an array element, then it has megabyte scale
    bounds and they can be represented with more precision. If your struct elements are of an arbitrary size and packed together at the byte level then you either have to live with the bounds giving rights to slightly more than
    a single struct element, or you decide that is unacceptable and pad the
    struct size up to the next representable size (just like regular non-packed structs enforce certain alignment), and pay a small memory overhead for
    that (<0.25%). That's a security decision you can make one way or another.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to theom+news@chiark.greenend.org.uk on Mon Mar 11 09:48:30 2024
    On 11 Mar 2024 11:10:15 +0000 (GMT), Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because they're packing >> > 192+ bits of information into a 128 bit space. This boils down to an
    alignment granularity: eg if you allocate a (1MiB+1) byte buffer the bounds
    might be 1MiB+64 (or whatever, I can't remember what the rounding is at this
    size). malloc() should ensure it doesn't hand out that memory to somebody >> > else; allocators typically do this anyway since they use slab allocators >> > which round up the allocation to a certain number of slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??

    You create a capability with petabyte-scale bounds. The precision of the >bounds may be limited, which means that you can't ram something else right
    up against the end or beginning of the array if they aren't sufficiently >aligned. This is in practice not a problem: slab allocators will round up >your address before they allocate the next thing, and most OSes won't >populate the rounded up space with pages anyway.

    By default Windows will populate allocated space. You have to
    explicitly use the virtual memory api to avoid it. 8-(

    When you take a pointer to an array element, then it has megabyte scale >bounds and they can be represented with more precision. If your struct >elements are of an arbitrary size and packed together at the byte level then >you either have to live with the bounds giving rights to slightly more than
    a single struct element, or you decide that is unacceptable and pad the >struct size up to the next representable size (just like regular non-packed >structs enforce certain alignment), and pay a small memory overhead for
    that (<0.25%). That's a security decision you can make one way or another.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Theo Markettos on Mon Mar 11 16:10:09 2024
    On 11 Mar 2024 11:10:15 +0000 (GMT)
    Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because they're
    packing 192+ bits of information into a 128 bit space. This
    boils down to an alignment granularity: eg if you allocate a
    (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
    can't remember what the rounding is at this size). malloc()
    should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab
    allocators which round up the allocation to a certain number of
    slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in a capability ??

    You create a capability with petabyte-scale bounds. The precision of
    the bounds may be limited, which means that you can't ram something
    else right up against the end or beginning of the array if they
    aren't sufficiently aligned. This is in practice not a problem: slab allocators will round up your address before they allocate the next
    thing, and most OSes won't populate the rounded up space with pages
    anyway.

    When you take a pointer to an array element, then it has megabyte
    scale bounds and they can be represented with more precision. If
    your struct elements are of an arbitrary size and packed together at
    the byte level then you either have to live with the bounds giving
    rights to slightly more than a single struct element, or you decide
    that is unacceptable and pad the struct size up to the next
    representable size (just like regular non-packed structs enforce
    certain alignment), and pay a small memory overhead for that
    (<0.25%). That's a security decision you can make one way or another.

    Theo

    Your time stamp (most likely +0000 part) confuses my Claws
    Mail newsreader. I wonder if others see similar problem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Mon Mar 11 16:13:48 2024
    On Mon, 11 Mar 2024 16:10:09 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    On 11 Mar 2024 11:10:15 +0000 (GMT)
    Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because they're packing 192+ bits of information into a 128 bit space. This
    boils down to an alignment granularity: eg if you allocate a
    (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
    can't remember what the rounding is at this size). malloc()
    should ensure it doesn't hand out that memory to somebody else; allocators typically do this anyway since they use slab
    allocators which round up the allocation to a certain number of
    slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in
    a capability ??

    You create a capability with petabyte-scale bounds. The precision
    of the bounds may be limited, which means that you can't ram
    something else right up against the end or beginning of the array
    if they aren't sufficiently aligned. This is in practice not a
    problem: slab allocators will round up your address before they
    allocate the next thing, and most OSes won't populate the rounded
    up space with pages anyway.

    When you take a pointer to an array element, then it has megabyte
    scale bounds and they can be represented with more precision. If
    your struct elements are of an arbitrary size and packed together at
    the byte level then you either have to live with the bounds giving
    rights to slightly more than a single struct element, or you decide
    that is unacceptable and pad the struct size up to the next
    representable size (just like regular non-packed structs enforce
    certain alignment), and pay a small memory overhead for that
    (<0.25%). That's a security decision you can make one way or
    another.

    Theo

    Your time stamp (most likely +0000 part) confuses my Claws
    Mail newsreader. I wonder if others see similar problem.


    After further examination, it's unlikely that +0000 is a confusing
    part. More likely my newsreader does not understand (GMT)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Mon Mar 11 14:50:03 2024
    Michael S <already5chosen@yahoo.com> writes:
    On 11 Mar 2024 11:10:15 +0000 (GMT)
    Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because they're
    packing 192+ bits of information into a 128 bit space. This
    boils down to an alignment granularity: eg if you allocate a
    (1MiB+1) byte buffer the bounds might be 1MiB+64 (or whatever, I
    can't remember what the rounding is at this size). malloc()
    should ensure it doesn't hand out that memory to somebody else;
    allocators typically do this anyway since they use slab
    allocators which round up the allocation to a certain number of
    slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs in a
    capability ??

    You create a capability with petabyte-scale bounds. The precision of
    the bounds may be limited, which means that you can't ram something
    else right up against the end or beginning of the array if they
    aren't sufficiently aligned. This is in practice not a problem: slab
    allocators will round up your address before they allocate the next
    thing, and most OSes won't populate the rounded up space with pages
    anyway.

    When you take a pointer to an array element, then it has megabyte
    scale bounds and they can be represented with more precision. If
    your struct elements are of an arbitrary size and packed together at
    the byte level then you either have to live with the bounds giving
    rights to slightly more than a single struct element, or you decide
    that is unacceptable and pad the struct size up to the next
    representable size (just like regular non-packed structs enforce
    certain alignment), and pay a small memory overhead for that
    (<0.25%). That's a security decision you can make one way or another.

    Theo

    Your time stamp (most likely +0000 part) confuses my Claws
    Mail newsreader. I wonder if others see similar problem.


    xrn on linux is not confused (which is not surprising since
    linux stores time internally as GMT anyway).

    Date: 11 Mar 2024 11:10:15 +0000 (GMT)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Mon Mar 11 17:18:18 2024
    On Mon, 11 Mar 2024 14:50:03 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On 11 Mar 2024 11:10:15 +0000 (GMT)
    Theo Markettos <theom+news@chiark.greenend.org.uk> wrote:

    MitchAlsup1 <mitchalsup@aol.com> wrote:
    Theo Markettos wrote:
    The bounds have a certain representation limits, because
    they're packing 192+ bits of information into a 128 bit space.
    This boils down to an alignment granularity: eg if you
    allocate a (1MiB+1) byte buffer the bounds might be 1MiB+64
    (or whatever, I can't remember what the rounding is at this
    size). malloc() should ensure it doesn't hand out that memory
    to somebody else; allocators typically do this anyway since
    they use slab allocators which round up the allocation to a
    certain number of slabs.

    So how to you "encode" a petaByte array ?? of megaByte structs
    in a capability ??

    You create a capability with petabyte-scale bounds. The precision
    of the bounds may be limited, which means that you can't ram
    something else right up against the end or beginning of the array
    if they aren't sufficiently aligned. This is in practice not a
    problem: slab allocators will round up your address before they
    allocate the next thing, and most OSes won't populate the rounded
    up space with pages anyway.

    When you take a pointer to an array element, then it has megabyte
    scale bounds and they can be represented with more precision. If
    your struct elements are of an arbitrary size and packed together
    at the byte level then you either have to live with the bounds
    giving rights to slightly more than a single struct element, or
    you decide that is unacceptable and pad the struct size up to the
    next representable size (just like regular non-packed structs
    enforce certain alignment), and pay a small memory overhead for
    that (<0.25%). That's a security decision you can make one way or
    another.

    Theo

    Your time stamp (most likely +0000 part) confuses my Claws
    Mail newsreader. I wonder if others see similar problem.


    xrn on linux is not confused (which is not surprising since
    linux stores time internally as GMT anyway).

    Date: 11 Mar 2024 11:10:15 +0000 (GMT)

    The issue does not appear to have anything to do with OS. It's all
    about parsing of 'Date' header.

    For example, in your message it looks like:
    Date: Mon, 11 Mar 2024 14:50:03 GMT
    Claws mail understand it.

    In message of Tim Rentsch it looks like:
    Date: Mon, 11 Mar 2024 07:54:07 -0700
    Claws mail understand it.

    In my messages format is the same as in Tim's.

    In messages of Theo the header looks like a mix of yours and ours:
    Date: 11 Mar 2024 11:10:15 +0000 (GMT)

    The wonders of Postel's law.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to All on Tue Mar 12 19:11:08 2024
    BGB wrote:

    Don't you ever (EVER {E V E R}) cut anything that is no longer relevant ????? >>

    Though, partly reverting the logic for the changes to the bus messaging
    also did not fix the issue. Behavior is otherwise "rather weird".


    So, bug hunt is being annoying.
    <snip>
    This is annoying...

    So is 8 pages of unnecessary and unuseful text.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Wed Mar 13 15:53:10 2024
    Robert Finch wrote:

    Can capabilities be applied to address ranges?

    That is a major thing that they provide.

    Segmentation similar to the PowerPC 32-bit segmentation is being used in
    the current project. Where the upper address bits select a segment
    register which provides more address bits. I would like to use the same descriptors for capabilities and the address range segmentation.

    How would you handle 2 billion Capabilities in a single application ??

    Each of which have a range of 2 GB each ??? and each containing at
    least 1 M Capabilities ????

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Wed Mar 13 18:47:12 2024
    Robert Finch wrote:

    On 2024-03-13 11:53 a.m., MitchAlsup1 wrote:
    Robert Finch wrote:

    Can capabilities be applied to address ranges?

    That is a major thing that they provide.

    Segmentation similar to the PowerPC 32-bit segmentation is being used
    in the current project. Where the upper address bits select a segment
    register which provides more address bits. I would like to use the
    same descriptors for capabilities and the address range segmentation.

    How would you handle 2 billion Capabilities in a single application ??

    Each of which have a range of 2 GB each ??? and each containing at
    least 1 M Capabilities ????

    I should have been a bit more clear maybe, it has taken time to gel in
    my head.

    PowerPC-32 has only 16 segment registers. I think these could be
    extended to capabilities registers in the same manner as proposed for
    the FS, GS registers in x64. I wonder if there is any value in doing so though, since the address is a constant. I think it should already be
    known if it would exceed a bounds. The segment registers simply tack on 24-bits to the left side of the remaining address bits to generate a
    52-bit virtual address. I think all a capability would do is provide a slightly different means to calculate the address.

    In the past, capability machines wanted to use capabilities for all
    relocation and all protection. As long as this is the case, an applica-
    tion has an unbounded need for capabilities.

    You can grant this with limited capabilities (top 4-odd bits) only when
    you have a means to load a new capability into a known <capability> base register[i]. Since this is privileged data, either the specified function- ality of this instruction is precisely specified and operates with access
    to GuestOS address space.....it is difficult to imagine how to add Hyper- Vision on top of GuestOS supervision.

    {{Or do you intend to void Hypervisors?}}

    I have a couple of cores I can experiment with adding capabilities. For
    my current project there are 32 segment registers.

    ******

    I am wondering if the ‘R’ term in the CHERI concentrate expansion calc can be less than zero or if it is a modulo value. It is shown as
    B[13:11] – 1. I am assuming it can go negative and is not modulo.

    How “open” is CHERI ? Can CHERI based code be posted?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Robert Finch on Wed Mar 13 22:49:23 2024
    On Wed, 13 Mar 2024 18:43:59 -0400, Robert Finch wrote:

    I got the impression that with capabilities processor modes may not be necessary. I think the distinction between hypervisor / supervisor may
    be lost. Not sure that is a good idea.

    It gets rid of the hierarchy, and replaces it with a matrix.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Mar 14 00:01:50 2024
    On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:

    Hypervisors are absolutely necessary if you want high RAS where a
    GuestOS may crash without taking the system down.

    Not really. Remember, the whole point about introducing memory protection
    into multitasking, multiuser OSes in the first place was precisely so that
    one program could crash without taking the system down.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Robert Finch on Wed Mar 13 23:18:37 2024
    Robert Finch wrote:



    In the past, capability machines wanted to use capabilities for all
    relocation and all protection. As long as this is the case, an applica-
    tion has an unbounded need for capabilities.

    It seems like it would have a lot of overhead, but it might be worth it
    for security.

    You can grant this with limited capabilities (top 4-odd bits) only when
    you have a means to load a new capability into a known <capability> base
    register[i]. Since this is privileged data, either the specified function- >> ality of this instruction is precisely specified and operates with
    access to GuestOS address space.....it is difficult to imagine how to
    add Hyper-
    Vision on top of GuestOS supervision.
    {{Or do you intend to void Hypervisors?}}

    I got the impression that with capabilities processor modes may not be necessary. I think the distinction between hypervisor / supervisor may
    be lost. Not sure that is a good idea.

    Hypervisors are absolutely necessary if you want high RAS where a GuestOS may crash without taking the system down. Does GuestOS use capabilities supplied
    by HyperVisor for which it has no visibility ?? just prescribed uses ??

    Consider a ECC error in a Capability and someone trying to use it.
    Is the error charged to GuestOS who owns and manages the capability ??
    or the application who is only following the prescribed uses of the
    capability ?? {{The "why shoot the messenger/innocent" problem.}}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Thu Mar 14 00:11:55 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:

    Hypervisors are absolutely necessary if you want high RAS where a
    GuestOS may crash without taking the system down.

    Not really. Remember, the whole point about introducing memory protection >into multitasking, multiuser OSes in the first place was precisely so that >one program could crash without taking the system down.

    Actually really. And modern archtitectures also protect the guest
    OS from the hypervisor by providing secure enclaves (intel)/realms(arm).

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Mar 14 00:27:38 2024
    Lawrence D'Oliveiro wrote:

    On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:

    Hypervisors are absolutely necessary if you want high RAS where a
    GuestOS may crash without taking the system down.

    Not really. Remember, the whole point about introducing memory protection into multitasking, multiuser OSes in the first place was precisely so that one program could crash without taking the system down.


    What happens to the non-HyperVised system when GuestOS goes down ??
    {{Guest OS is a program, is it not ??}}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Mar 14 01:59:58 2024
    On Thu, 14 Mar 2024 00:27:38 +0000, MitchAlsup1 wrote:

    What happens to the non-HyperVised system when GuestOS goes down ??

    Nothing. That’s what makes it a “guest”.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Thu Mar 14 01:59:29 2024
    On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    Providing an entire separate kernel for each VM is often unnecessary. If
    you need separation at the level of entire subsystems, as opposed to
    individual processes, then that’s what containers are for.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to Robert Finch on Thu Mar 14 12:12:16 2024
    Robert Finch <robfi680@gmail.com> wrote:
    I am wondering if the ‘R’ term in the CHERI concentrate expansion calc can be less than zero or if it is a modulo value. It is shown as
    B[13:11] – 1. I am assuming it can go negative and is not modulo.

    I understand it is signed, so can go negative:

    https://github.com/CTSRD-CHERI/sail-cheri-riscv/blob/6e3613a2c46fb809e526b55c5c72acb041194ab8/src/cheri_cap_common.sail#L276

    How “open” is CHERI ? Can CHERI based code be posted?

    CHERI is as open as we can make it:

    The architecture specification is published: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf

    We have an agreement with Arm that 'capability essential IP' (ie IP which is fundamental to the architecture, rather than details of the implementation)
    is public and they won't patent it.

    The Morello architecture spec is published. How Arm do their implementation
    is up to them.

    The CHERI-RISC-V (and MIPS) architectures are published in the above. architecture specification. The Sail formal models are open source: https://github.com/CTSRD-CHERI/sail-cheri-riscv

    We have an ongoing effort to standardised CHERI through the RISC-V standardisation process.

    CHERI software (compiler, toolchain, CheriBSD, application changes) and
    RISC-V (and MIPS) hardware artifacts (implementation of CHERI cores in RTL)
    are open source.


    There's certainly no problems about posting CHERI code - to be encouraged!

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo Markettos@21:1/5 to BGB on Thu Mar 14 12:37:32 2024
    BGB <cr88192@gmail.com> wrote:
    Presumably, in addition to the code, one needs some way for the code to
    be able to access its own ".data" and ".bss" sections when called.

    AIUI you derive a capability from PCC (the PC capability) that gives you
    access to your local 'captable', which then holds pointers to your other objects. The captable can be read-only but the capabilities inside it can
    be writable (ie pointers allow you to write to your globals etc).

    Some options:
    PC-relative:
    Unclear if valid in this case.
    GOT:
    Table of pointers to things, loaded somehow.
    One example here being the ELF FDPIC ABI.
    Reloading a Global Pointer via a lookup table accessed via itself.
    This is what my ABI uses...

    I couldn't seem to find any technical descriptions of the CHERI/Morello
    ABI. I had made a guess that it might work similar to FDPIC, as this
    could be implemented without needing to use raw addresses (and seemed
    like a "best fit").

    This is a description of the linkage model for CHERI MIPS; I'm not aware of anything having changed significantly for RISC-V or Morello, although exact usage of registers etc will be different.

    https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/20190113-cheri-linkage.pdf

    This also describes the OS-facing ABI on CheriBSD: https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201904-asplos-cheriabi.pdf

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Theo Markettos on Thu Mar 14 16:45:57 2024
    Theo Markettos wrote:

    BGB <cr88192@gmail.com> wrote:
    Presumably, in addition to the code, one needs some way for the code to
    be able to access its own ".data" and ".bss" sections when called.

    AIUI you derive a capability from PCC (the PC capability) that gives you access to your local 'captable', which then holds pointers to your other objects. The captable can be read-only but the capabilities inside it can
    be writable (ie pointers allow you to write to your globals etc).

    Some options:
    PC-relative:
    Unclear if valid in this case.
    GOT:
    Table of pointers to things, loaded somehow.
    One example here being the ELF FDPIC ABI.
    Reloading a Global Pointer via a lookup table accessed via itself.
    This is what my ABI uses...

    I couldn't seem to find any technical descriptions of the CHERI/Morello
    ABI. I had made a guess that it might work similar to FDPIC, as this
    could be implemented without needing to use raw addresses (and seemed
    like a "best fit").

    This is a description of the linkage model for CHERI MIPS; I'm not aware of anything having changed significantly for RISC-V or Morello, although exact usage of registers etc will be different.

    https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/20190113-cheri-linkage.pdf

    This also describes the OS-facing ABI on CheriBSD: https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201904-asplos-cheriabi.pdf

    So, how does a Cheri machine do::

    void foo( int * i )
    {
    static int j;
    int *k = &j;

    bar( k, i );
    }

    And how does a Cheri machine implement fprintf( file *f, ... ) from

    void printf( ... )
    {
    fprintf( stdout, ... );
    }

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to BGB on Thu Mar 14 20:38:31 2024
    On Thu, 14 Mar 2024 03:20:57 -0500, BGB wrote:

    A capability effectively encodes 3 addresses:
    An upper bound, lower bound, and a target address.
    A segment descriptor generally only needs two:
    A base address, and a size.

    You need information on where the segment is located though, don’t you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to BGB on Thu Mar 14 22:08:54 2024
    BGB wrote:

    I am guessing Bounds-Check-Enforce is more likely to have around a 30%

    My guess would be SQRT(30%) ~= 15%

    overhead, maybe more or less. But, this is likely also to be for code
    that is potentially hostile. But, then, one wants the security to be
    strong enough that there is no practical way for code to break out of
    the sandbox; though, if allowing for arbitrary machine code, then there
    is still the great potential Achilles heel that is the Global Pointer or
    GOT.

    Note:: GOT is not ST-able in My 66000 architecture.....You can LD it into
    a Register for accessing what it points at or you can LD it into IP and
    execute code over there. {No trampoline}

    Only sure way to avoid this is to not have any "potentially compromising" capabilities anywhere

    "within the graph of what is reachable from the hostile code" is redundant.

    and the main obvious way to do this is via the use of system call.

    If operating solely at the C level, it is a little easier: One needs to
    make sure that there is no way for the code to get direct access to the Global Pointer or GOT or similar. An ABI based on FDPIC would be bad
    here, since it is within the reach of C code (under typical C behavior,
    UB notwithstanding) to be able to gain access to the GOT for an
    arbitrary function pointer.

    Application cannot ST to its GOT.

    A big chunk of this would be overhead shared with the 128-bit ABI (which would have gone over entirely to 128-bit bounds-checked pointers), with
    a few new/additional overheads.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Mar 14 22:11:41 2024
    Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 00:27:38 +0000, MitchAlsup1 wrote:

    What happens to the non-HyperVised system when GuestOS goes down ??

    Nothing. That’s what makes it a “guest”.


    Ok, you are running RealOS and RealOS crashes/hangs/"does not fetch and execute instructions"--you say nothing happens--I guess you are correct in your wording but this is far from what anyone anticipates.

    There is no-one to take over.......and deal with the GuestOS crash.......

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Mar 14 22:14:57 2024
    Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    Providing an entire separate kernel for each VM is often unnecessary. If
    you need separation at the level of entire subsystems, as opposed to individual processes, then that’s what containers are for.


    If you are running k Linuxes under a single HyperVisor, you should be able
    to share all the Linux code after giving each of them their own VaS for data.

    Similarly, all library code used by the kernel should be shared uniformly across all users, too {{stdlib, libm, strlib, ...}} where each chunk of
    code gets its own static (and global) variables in scope.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Thu Mar 14 22:16:23 2024
    Lawrence D'Oliveiro wrote:

    On Wed, 13 Mar 2024 23:18:37 +0000, MitchAlsup1 wrote:

    Hypervisors are absolutely necessary if you want high RAS where a
    GuestOS may crash without taking the system down.

    Not really. Remember, the whole point about introducing memory protection into multitasking, multiuser OSes in the first place was precisely so that one program could crash without taking the system down.


    Exactly the same reason HyperVisors were introduced, so GuestOSs could crash without taking down the system. A GuestOS crash does take down the applications it happens to be running at the instant of crashing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Mar 14 22:47:33 2024
    On Thu, 14 Mar 2024 22:11:41 +0000, MitchAlsup1 wrote:

    There is no-one to take over.......and deal with the GuestOS
    crash.......

    You can have a management process in the host that watches for these sorts
    of events, easily enough.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Thu Mar 14 22:48:41 2024
    On Thu, 14 Mar 2024 22:14:57 +0000, MitchAlsup1 wrote:

    Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    Providing an entire separate kernel for each VM is often unnecessary.
    If you need separation at the level of entire subsystems, as opposed to
    individual processes, then that’s what containers are for.

    If you are running k Linuxes under a single HyperVisor, you should be
    able to share all the Linux code after giving each of them their own VaS
    for data.

    Unnecessary to set up complete separation only to poke holes (particularly
    big holes) in it to share stuff. Simpler just to create a separation setup
    that only separates what needs to be separate.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Thu Mar 14 23:47:00 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    Providing an entire separate kernel for each VM is often unnecessary. If
    you need separation at the level of entire subsystems, as opposed to
    individual processes, then that’s what containers are for.


    If you are running k Linuxes under a single HyperVisor, you should be able
    to share all the Linux code after giving each of them their own VaS for data.

    Bad idea. Single point of failure. Impossible to update one without
    updating all. Linux does update code dynamically when loading and
    unloading kernel modules.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Chris M. Thomasson on Fri Mar 15 01:32:14 2024
    On Thu, 14 Mar 2024 17:21:29 -0700, Chris M. Thomasson wrote:

    On 3/14/2024 3:47 PM, Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 22:11:41 +0000, MitchAlsup1 wrote:

    There is no-one to take over.......and deal with the GuestOS
    crash.......

    You can have a management process in the host that watches for these
    sorts of events, easily enough.

    Watchdog, tick tick... ;^)

    Event-driven would be more efficient and more responsive than periodic
    polling.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Fri Mar 15 14:39:52 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Thu, 14 Mar 2024 17:21:29 -0700, Chris M. Thomasson wrote:

    On 3/14/2024 3:47 PM, Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 22:11:41 +0000, MitchAlsup1 wrote:

    There is no-one to take over.......and deal with the GuestOS
    crash.......

    You can have a management process in the host that watches for these
    sorts of events, easily enough.

    Watchdog, tick tick... ;^)

    Event-driven would be more efficient and more responsive than periodic >polling.

    That assumes that an event can be generated, which may not be possible
    with a guest os crash (if, for example, it was in an infinite loop
    with interrupts disabled).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Fri Mar 15 21:14:49 2024
    On Fri, 15 Mar 2024 14:39:52 GMT, Scott Lurndal wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    Event-driven would be more efficient and more responsive than periodic >>polling.

    That assumes that an event can be generated, which may not be possible
    with a guest os crash (if, for example, it was in an infinite loop with interrupts disabled).

    It wouldn’t be a “guest OS”, it would be a “guest container”. Remember,
    the processes in the container are isolated from the host, but the host continues to have full visibility into the guest.

    For a container, guest termination is synonymous with termination of its guest-specific “init” (container-internal PID 1) process. A host watcher process can monitor several of these at once via the Linux pidfd mechanism <https://manpages.debian.org/2/pidfd_open.en.html>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Theo on Sun Mar 17 11:52:00 2024
    <https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f23245dace46629 7f20a0dbd22d371.pdf> wrote:

    Fontconfig's serialization code heavily relied on being able
    to create pointers from arbitrary pointer arithmetic, and this
    is not compatible with CHERI.

    Can you describe this in more detail? The library I work on manages its
    memory in a very detailed way. It asks the host application to allocate largeish blocks for it, and then subdivides the blocks itself, in many
    ways, creating lots of pointers to entities within them.

    That sounds a lot like "create pointers from arbitrary pointer
    arithmetic." It's somewhat like the way a C run-time library gets memory
    from the OS and manages the heap, only with specialisation for the sizes
    of memory blocks that are used in large numbers. This is done to avoid
    horrific heap fragmentation that would otherwise happen.

    Presumably, this will require support in the library, analogous to the
    way support is required in glibc?

    Presumably sizeof(void*) == 16? Do any pointer types remain 8-byte, or do
    they all grow to 16 bytes?

    Given that Morello will run non-CHERI ARM64 code, how are the transitions between CHERI and vanilla code handled? Must a process be one or the
    other?

    In article <Ory*7U4Ez@news.chiark.greenend.org.uk>, theom+news@chiark.greenend.org.uk (Theo) wrote:

    Should be - it's mostly making things play by the rules. Once they
    play by the rules then it means they will work the same (or less
    buggily) on a regular C platform.

    The above link describes the changes - a number being replacing
    'long' with intptr_t, some undefined behaviour, bad use of
    realloc().

    The things that look challenging on a skim of the report are both in the "language run-time" category.

    We have our own implementation of printf, which has a lot of machine-
    specific code in it. That's done so that it can be extended with new
    formatter codes at run-time: we add a few hundred formatters that know
    about lots of internal types, many of them large structures. That tells
    me that I'm going to have to treat the CHERI-enabled version of a
    platform as a distinct platform, but that was likely inevitable anyway.

    The interfaces that let our LISP interpreter in the test harness call C
    code, and vice-versa are also a potential problem.

    That realloc() bug was a beauty. It was a silly piece of code anyway, but
    would work on most flat-address-space machines without capabilities.

    Some of it was modernisation of old codebases (eg add C11 atomics),

    To replace GCC __sync_* intrinsics, I see. Probably a good idea anyway,
    and worth investigating.

    Do you want to compartmentalise that shared library, ie put in trust boundaries between the library and its caller?

    No, so things should be straightforward, provided the linker and run-time
    are willing to include the library in the same compartment as its caller.
    I presume that's possible?

    If you just want to run the shared library as-is, you can recompile
    it and get bounds checking etc.

    Compartmentalising the library's data reading is superficially attractive,
    but it's pretty complicated. Specifically, expanding the save format to
    the in-memory format can use large chunks of the library's code that are
    also used in doing operations on in-memory data in response to API calls.


    I believe Morello Linux is able to support console-mode apps - ie
    it has support context switching and use of capabilities in userspace,
    with some support in glibc. I believe there is now a dynamic linker,
    but not sure of the status.

    To do porting work, I'd need X11 for debugging graphics, and some
    prospect of commercial demand for CHERI-enabled libraries. Here, the
    familiar chicken-and-egg problem with new architectures rears its head.

    That might be overcome if, for example, ARM Ltd was to adopt a
    CHERI-based architecture as ARMv10. But something like that would be
    needed to build commercial momentum. Without such a commitment, CHERI may
    well fade away.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Scott Lurndal on Mon Mar 18 01:29:03 2024
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup1) writes:
    Lawrence D'Oliveiro wrote:

    On Thu, 14 Mar 2024 00:11:55 GMT, Scott Lurndal wrote:

    The architectural features supporting virtualization are designed to
    isolate guests from both the hypervisor and other guests.

    Providing an entire separate kernel for each VM is often unnecessary. If >>> you need separation at the level of entire subsystems, as opposed to
    individual processes, then that’s what containers are for.


    If you are running k Linuxes under a single HyperVisor, you should be able >>to share all the Linux code after giving each of them their own VaS for data.

    Bad idea. Single point of failure. Impossible to update one without updating all. Linux does update code dynamically when loading and
    unloading kernel modules.

    I actually have a 4-level system:: HyperVisor is the only layer that is not allowed to crash (RISC-V calls this machine). Progressing towards less privilege
    is GuestHV, GuestOS, and Application. Hypervisor provides only memory, timing, and device identification services. GuestHV provides what most would call the HyperVisor, GuestOS woudl be LINUX, and everybody knows what an application is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to mitchalsup@aol.com on Mon Mar 18 14:17:30 2024
    mitchalsup@aol.com (MitchAlsup1) writes:
    Scott Lurndal wrote:
    mitchalsup@aol.com (MitchAlsup1) writes:


    If you are running k Linuxes under a single HyperVisor, you should be able >>>to share all the Linux code after giving each of them their own VaS for data.

    Bad idea. Single point of failure. Impossible to update one without
    updating all. Linux does update code dynamically when loading and
    unloading kernel modules.

    I actually have a 4-level system::

    That is completely orthogonal to the idea of sharing linux code between
    guests.

    ARMv8 has a similar 4-level (5, if your counting the machine layer):

    - Machine (secure) (e.g. SMM)
    - Hypervisor (non-secure)
    - Nested Hypervisor (non-secure) [Optional, and not yet widely used]
    - Guest OS (or Bare metal if there is no hypervisor)
    - User mode

    HyperVisor is the only layer that is not
    allowed to crash (RISC-V calls this machine).

    Progressing towards less privilege
    is GuestHV, GuestOS, and Application. Hypervisor provides only memory, timing, >and device identification services.

    Absent universal SR-IOV, the hypervisor also needs to manage shared
    I/O access (keyboard controller, graphics, disk and networking)
    and with SR-IOV, creation and assignment of virtual functions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)