Linear Address Spaces
=====================
By Poul-Henning Kamp
Communications of the ACM, October 2022, Vol. 65 No. 10, Pages 42-44 10.1145/3561991
I recently bought an Apple computer with the new M1 CPU to supplement
the beastiarium known as Varnish Cache's continuous integration
cluster. I am a bit impressed that it goes head-to-head with the
s390x virtual machine borrowed from IBM, while never drawing more
than 25 watts, but other than that: Meh...
This is one disadvantage of being a systems programmer: You see up
close how each successive generation in an architecture has been
inflicted with yet another "extension," "accelerator," "cache,"
"look-aside buffer," or some other kind of "marchitecture," to the
point where the once-nice and orthogonal architecture is almost
obscured by the "improvements" that followed. It seems almost like a
law of nature:
Any successful computer architecture, under immense pressure to
"improve" while "remaining 100% compatible," will become a
complicated mess.
Show me somebody who calls the IBM S/360 a RISC design, and I will
show you somebody who works with the s390 instruction set today.
Or how about: The first rule of ARM is, "We don't talk about
Thumb-2."
A very special instance of this law happened when AMD created the
x86-64 instruction set to keep the x86 architecture alive after
Intel, the nominal owner of that architecture, had all but abandoned
it and gone full Second Systems Syndrome with the ill-fated Itanium architecture.
Fundamentally, we now have both kinds of CPUs--ARM and x64--and they
both suffer from the same architectural problems. Take, for example,
the translation from linear virtual to linear physical addresses. Not
only have page table trees grown to a full handful of levels, but
there are also multiple page sizes, each of which comes with its own
handful of footnotes limiting usability and generality.
Why do we even have linear physical and virtual addresses in the
first place, when pretty much everything today is object-oriented?
Linear virtual addresses were made to be backward-compatible with
tiny computers with linear physical addresses but without virtual
memory. There are still linear virtual addresses that are
backward-compatible with computers that were backward-compatible with
computers that were...
Apart from the smallest microcontrollers, nobody sane uses linear
address spaces anymore, neither physical nor virtual. The very first
thing any real-time nucleus or operating system kernel does is
implement an abstract object store on top of the linear space. If the
machine has virtual memory support, it then tries to map from virtual
to physical as best it can, given five levels of page tables and all
that drags in with it.
Translating from linear virtual addresses to linear physical
addresses is slow and complicated, because 64-bit can address a lot
of memory.
Having a single linear map would be prohibitively expensive in terms
of memory for the map itself, so translations use a truncated tree
structure, but that adds a whole slew of new possible exceptions:
What if the page entry for the page directory entry for the page
entry for the exception handler for missing page entries is itself
empty?
This is the land where "double fault exceptions" and "F00F
workarounds" originate. And with five levels of page tables, in the
ultimate worst case, it takes five memory accesses before the CPU
even knows where the data is.
It doesn't have to be that way.
One of my volunteer activities for datamuseum.dk is writing a
software emulation of a unique and obscure computer: the Rational
R1000/s400. (It's OK; you can look it up. I'll wait, because until
one stood on our doorstep, I had never heard of it either.)
The R1000 has many interesting aspects, not the least of which is
that it was created by some very brilliant and experienced people who
were not afraid to boldly go. And they sure went: The instruction set
is Ada primitives, it operates on bit fields of any alignment, and
the data bus is 128 bits wide: 64-bit for the data and 64-bit for
data's type. They also made it a four-CPU system, with all CPUs
operating in the same 64-bit global address space. It also needed a
good 1,000 amperes at five volts delivered to the backplane through a
dozen welding cables.
The global 64-bit address space is not linear; it is an object cache
addressed with an (object + offset) tuple, and if that page of the
object is not cached, a microcode trap will bring it in from disk.
In marketing materials, the object cache was sold as "memory boards,"
but in hardware, they contained a four-way associative cache, which
brilliantly hid the tag-RAM lookup during the row address strobe
(RAS) part of the DRAM memory cycle, so that it is precisely as fast
as a normal linear DRAM memory would have been.
State-of-the-art CPUs today can still address only approximately 57
bits of address space, using five levels of page tables, each level successively and slowly sorting out another 10 bits of the address.
The R1000 addresses 64 bits of address space instantly in every
single memory access. And before you tell me this is impossible: The
computer is in the next room, built with 74xx-TTL
(transistor-transistor logic) chips in the late 1980s. It worked back
then, and it still works today.
The R1000 was a solid commercial success, or I guess I should say
military success, because most customers were in the defense
industry, and the price was an eye-watering, $1 million a pop. It
also came with the first fully semantic IDE, but that is a story for
another day.
Given Ada's reputation for strong typing, handling the type
information in hardware rather than in the compiler may sound
strange, but there is a catch: Ada's variant data structures can make
it a pretty daunting task to figure out how big an object is and
where to find things in it. Handling data + type as a unit in
hardware makes that fast.
Why do we even have linear physical and virtual addresses in the
first place, when pretty much everything today is object-oriented?
Not that type-checking in hardware is a bad idea. Quite the contrary:
The recent announcement by ARMa that it has prototyped silicon with
its Morello implementation of Cambridge University's CHERI
architecture gives me great hope for better software quality in the
future.
Cut to the bone, CHERI makes pointers a different data type than
integers in hardware and prevents conversion between the two types.
Under CHERI, new valid pointers can be created only by derivation
from existing valid pointers, either by restricting the permissible
range or by restricting the permissions. If you try to create or
modify a pointer by any other means, it will no longer be a pointer,
because the hardware will have cleared the special bit in memory that
marked it as a valid pointer.
According to Microsoft Research,b CHERI would have deterministically
detected and prevented a full 43% of the security problems reported
to the company in 2019. To put that number in perspective: The
National Highway Traffic Safety Administration reports that 47% of
the people killed in traffic accidents were not wearing seat belts.
Like mandatory seat belts, some people argue there would be no need
for CHERI if everyone "just used type-safe languages," or they will
claim that the extra bits for "capabilities" CHERI pointers carry
make their programs look fat.
I'm not having any of it.
The linear address space as a concept is unsafe at any speed, and it
badly needs mandatory CHERI seat belts. But even better would be to
get rid of linear address spaces entirely and go back to the future,
as successfully implemented in the Rational R1000 computer 30-plus
years ago.
Author
======
Poul-Henning Kamp spent more than a decade as one of the primary
developers of the FreeBSD operating system before creating the
Varnish HTTP Cache software, which aroun a fifth of all Web traffic
goes through at some point. He is an independent contractor; one of
his most recent projects was a supercomputer cluster to stop the
stars twinkling in the mirrors of ESO's new ELT (extremely large
telescope).
From: <
https://web.archive.org/web/20220921202132/ https://cacm.acm.org/magazines/2022/10/
264852-linear-address-spaces/fulltext>
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)