Forum: >>> Magnum BBS <<<

indirection in old architectures

From Anton Ertl@21:1/5 to All on Fri Dec 29 17:20:43 2023

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

The major question I have is why these architectures have this
feature.

The only use I can come up with for the arbitrarily repeated
indirection is the implementation of logic variables in Prolog.
However, Prolog was first implemented in 1970, and it did not become a
big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

A use for a single indirection is the implementation of the memory
management in the original MacOS: Each dynamically allocated memory
block was referenced only from a single place (its handle), so that
the block could be easily relocated. Only the address of the handle
was freely passed around, and accessing the block then always required
double indirection. MacOS was implemented on the 68000, which did not
have the indirect bit; this demonstrates that the indirect bit is not
necessary for that. Nevertheless, such a usage pattern might be seen
as a reason to add the indirect bit. But is it enough?

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Anton Ertl on Fri Dec 29 19:04:56 2023

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this >repeats until a word without this bit is found. At least that's how I >understand the descriptions of this feature.

That's essentially accurate. The Burroughs medium systems
operands were described by an operand address that included
an 'address controller'. The address controller, a four-bit
field, specified two characteristics of the address; the
two-bit 'index' field contained the number of the index register
(there were three) to be used when calculating the final
address. The other two bits described how the data at the
final address should be treated by the processor
0b00 Unsigned Numeric Data [UN] (BCD)
0b01 Signed Numeric Data [SN] (BCD, first digit 0b1100 = "+", 0b1101 = '-').
0b10 Unsigned Alphanumeric Data [UA] (EBCDIC)
0b11 Indirect Address [IA]

Consider the operand 053251, this described an unsigned
numeric value starting at the address 53251 with no indexing.

The operand 753251 described an address indexed by IX1
and of the type 'indirect address' which points to another
operand word (potentially resulting in infinite recursion,
which was detected by an internal timer which would terminate
the process when triggered).

The actual operand data type was determined by the
address controller of the first operand that isn't
marked IA.

The major question I have is why these architectures have this
feature.

Primarily for flexibility in addressing without adding substantial
hardware support.

The only use I can come up with for the arbitrarily repeated
indirection is the implementation of logic variables in Prolog.

The aforementioned system ran mostly COBOL code (with some BPL;
assemblers weren't generally provided to customers).

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

Consider following a linked list to the final element as an
example usage.

The aforementioned system also had a SLL (Search Linked List)
that would test each element for one of several conditions
and terminate the indirection when the condition was true.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

I guess I don't understand the question. It's just a pointer in
a linked list.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Dec 29 19:36:00 2023

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this >repeats until a word without this bit is found. At least that's how I >understand the descriptions of this feature.

More or less. Indirect addressing was always controlled by a bit in
the instruction. It was more common to have only a single level of
indirect addressing, just controlled by that instruction bit.
Multi-level wasn't much more useful and you had to have a way to break
address loops.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

The CPU follows the indirect address chain to get the operand address
and then does the operation. On the PDP-10, this stores into the
word that FOO points to, perhaps after multiple indirections:

MOVEM AC,@FOO

while this stores into FOO itself:

MOVEM AC,FOO

The major question I have is why these architectures have this
feature.

Let's say you want to add up a list of numbers and your machine
doesn't have any index registers. What else are you going to do?

Indirect addressing was a big improvement over patching the
instructions and index registers were too expensive for small
machines. The IBM 70x mainframes had index registers, the early DEC
PDP series didn't other than the mainframe-esque PDP-6 and -10. The
PDP-11 mini was a complete rethink a decade after the PDP-1 with eight registers usable for indexing and no indirect addressing.

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

They were also useful for argument lists which were invariably in
memory on machines without a lot of registers which was all of them
before S/360 and the PDP-6. On many machines a Fortran subroutine call
would leave the return address in an index register and the addresses
of the arguments were in the words after the call. The routine would
use something like @3(X) to get the third argument. Nobody other than
maybe Lisp cared about reentrant or recursive code, and if the number
of arguments in the call didn't match the number the routine expected
and your program blew up, well, don't do that.

As you suggested, a lot of uses boiled down to providing a fixed
address for something that can move, so instructions could indirect
through that fixed address without having to load it into a register.

For most purposes, index registers do indirection better, and now that everything has a lot of registers, you can use some of them for the fixed->movable stuff like the GOT in Unix/linux shared libraries.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Anton Ertl on Fri Dec 29 20:27:29 2023

Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

The major question I have is why these architectures have this
feature.

Solves the memory access problem {arrays, nested arrays, linked lists,...}
The early machines had "insufficient" address generation means, and used indirection as a trick to get around their inefficient memory address mode.

The only use I can come up with for the arbitrarily repeated
indirection is the implementation of logic variables in Prolog.
However, Prolog was first implemented in 1970, and it did not become a
big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

Some of the indirection machines had indirection-bit located in the
container at the address generated, others had the indirection in
the address calculation. In the case of the PDP-10 there was a time-
out counter and there were applications that worked fine up to a
particular size, and then simply failed when the indirection watch
dog counter kept "going off".

A use for a single indirection is the implementation of the memory
management in the original MacOS: Each dynamically allocated memory
block was referenced only from a single place (its handle), so that
the block could be easily relocated. Only the address of the handle
was freely passed around, and accessing the block then always required
double indirection. MacOS was implemented on the 68000, which did not
have the indirect bit; this demonstrates that the indirect bit is not necessary for that. Nevertheless, such a usage pattern might be seen
as a reason to add the indirect bit. But is it enough?

Two things: 1) the indirect bit is insufficient, 2) optimizing compilers
got to the point they were better at dereferencing linked lists than
the indirection machines were. {Reuse and all that rot.}

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

Arrays, matrixes, scatter, gather, lists, queues, stacks, arguments,....
We did all sorts of infinite-indirect stuff in asm on the PDP-10 {KI}
when programming at college.

They went out of fashion when compilers got to the point they could
hold the intermediate addresses in registers and short circuit the
amount of indirection needed--improving performance due to accessing
fewer memory locations.

The large register files of RISC spelled their doom.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

In the machines where the indirection is at the instruction level, this
was simple, in the machines where the indirection was at the target, it
was more difficult.

- anton

Summary::

First the architects thought registers were expensive.
{Many doubled down by OP-Mem ISAs.}
The architects endowed memory addressing with insufficient capabilities.
{Many to satisfy the OP-Mem and Mem-OP ISA they had imposed upon themselves} Then they added indirection to make up for insufficient addressing.
And then everyone waited until RISC showed up (1980) before realizing their error in register counts.
{Along about this time, Compilers started getting good.}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Dec 29 21:59:25 2023

According to MitchAlsup <mitchalsup@aol.com>:

Some of the indirection machines had indirection-bit located in the
container at the address generated, others had the indirection in
the address calculation. In the case of the PDP-10 there was a time-
out counter and there were applications that worked fine up to a
particular size, and then simply failed when the indirection watch
dog counter kept "going off".

No, that's what the GE 635 did, a watchdog timer reset each time it
started a new instruction. The PDP-6 and -10 could take an interrupt
each time it calculated an address and would restart the instruction
when the interrupt returned. This worked because unlike on the 635 the
address calculation didn't change anything. (Well, except for the ILDB
and IDPB instructions that needed the first part done flag. But I
digress.)

You could tell how long the time between clock interrupts was by
making an ever longer indirect address chain and seeing where your
program stalled. It wouldn't crash, it just stalled as the very long
address chain kept being interrupted and restarted. I'm not being
hypothetical here.

Two things: 1) the indirect bit is insufficient, 2) optimizing compilers
got to the point they were better at dereferencing linked lists than
the indirection machines were. {Reuse and all that rot.}

More importantly, index registers are a lot faster than indirect
addressing and at least since the IBM 801, we have good algorithms to
do register scheduling.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

In the machines where the indirection is at the instruction level, this
was simple, in the machines where the indirection was at the target, it
was more difficult.

The indirection was always in the address word(s), not in the target.
It didn't matter if it was a load or a store.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joe Pfeiffer@21:1/5 to Anton Ertl on Sat Dec 30 12:26:02 2023

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

The major question I have is why these architectures have this
feature.

I'll hazard a guess that once you've got the indirect bit out in memory,
it's easier to just use the same logic on all memory reads than to only
let it happen once.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Dec 30 23:26:20 2023

According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:

I'll hazard a guess that once you've got the indirect bit out in memory,
it's easier to just use the same logic on all memory reads than to only
let it happen once.

That's not how indirect addressing worked.

There was always a bit in the instruction to say to do indirection.

Sometimes that was it, sometimes on machines where the word size was
bigger than the address size, it also looked at some other bit in the
indirect word to see whether to keep going. On the PDP-8, the words
were 12 bits and the addresses were 12 bits so there was no room, they
couldn't have done multilevel indirect if they wanted to.

As several of us noted, multilevel indirection needed something to
break loops, while single level didn't. In my experience, multiple
indirection wasn't very useful, I didn't miss it on the -8, and I
can't recall using it other than as a gimmick on the PDP-10.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Quadibloc@21:1/5 to Anton Ertl on Sun Dec 31 08:00:14 2023

On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the word
is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

The major question I have is why these architectures have this feature.

No doubt this answer has already been given.

The reason these architectures had that feature was because of a feature
they _didn't_ have: an index register.

So in order to access arrays and stuff like that, instead of doing surgery
on the short address inside an instruction, you can simply store a full
address in a word somewhere that points anywhere you would like.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

Let's assume we do have an architecture that supports multi-level
indirection. So an instruction word looks like this:

(i)(x)(opcode)(p)(address)

and an address constant looks like this:

(i)(x)(address)

So in an address constant (some architectures that had index registers
kept indirection) you could specify indexing too, but now the address was longer by the length of the opcode field.

If the address inside an instruction is too short to handle all of memory
(i.e. the word length is less than 24 bits) then you need a "page" bit in
the instruction: 0 means page zero, shared by the whole program, 1 means
the current page - the one the instruction is on.

Let's now say the instruction is a _store_ instruction. Then what? Well,
if the indirect bit is set, it acts like a *load* instruction, to fetch and load the effective address. It only stores at the point where indirection
ends - where the address is now of the actual location to do the storing
in, rather than the location of the effective address, which must be read,
not written.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to John Levine on Sun Dec 31 17:40:53 2023

On 30/12/2023 23:26, John Levine wrote:

and I
can't recall using it other than as a gimmick on the PDP-10.

It's a very long time ago, but I'm sure I do recall seeing it used on a DECSystem10 for arrays of pointers for indirection.

The fact that 40 years later I can remember the @ being used in
assembler must mean something.

Modern machines don't like wasting space so much. On the '10 an address
pointed to was a 36 bit value with an 18 bit address in it. And the
indirection bit. There was space for things like this.

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to MitchAlsup on Sun Dec 31 17:54:44 2023

MitchAlsup <mitchalsup@aol.com> schrieb:

Quadibloc wrote:

On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the word >>> is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I
understand the descriptions of this feature.

The major question I have is why these architectures have this feature.

No doubt this answer has already been given.

The reason these architectures had that feature was because of a feature
they _didn't_ have: an index register.

This is a better explanation than above. Instead of paying the high price needed for index registers, they use main memory as their index registers. {{A lot like building linked lists in FORTRAN 66}}.

The PDP-10 had both a recursive indirect bit and index registers (aka
memory locations 1 to 15), if I remember the manuals correctly
(I did a bit of reading, but I've never even come close to one of
these machines).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Quadibloc on Sun Dec 31 17:16:35 2023

Quadibloc wrote:

On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the word
is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I
understand the descriptions of this feature.

The major question I have is why these architectures have this feature.

No doubt this answer has already been given.

The reason these architectures had that feature was because of a feature
they _didn't_ have: an index register.

This is a better explanation than above. Instead of paying the high price needed for index registers, they use main memory as their index registers.
{{A lot like building linked lists in FORTRAN 66}}.

So in order to access arrays and stuff like that, instead of doing surgery
on the short address inside an instruction, you can simply store a full address in a word somewhere that points anywhere you would like.

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

Let's assume we do have an architecture that supports multi-level indirection. So an instruction word looks like this:

(i)(x)(opcode)(p)(address)

and an address constant looks like this:

(i)(x)(address)

So in an address constant (some architectures that had index registers
kept indirection) you could specify indexing too, but now the address was longer by the length of the opcode field.

If the address inside an instruction is too short to handle all of memory (i.e. the word length is less than 24 bits) then you need a "page" bit in
the instruction: 0 means page zero, shared by the whole program, 1 means
the current page - the one the instruction is on.

Going all PDP-8 on us now ??

Let's now say the instruction is a _store_ instruction. Then what? Well,
if the indirect bit is set, it acts like a *load* instruction, to fetch and load the effective address. It only stores at the point where indirection ends - where the address is now of the actual location to do the storing
in, rather than the location of the effective address, which must be read, not written.

John Savard

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Levine on Sun Dec 31 18:25:56 2023

John Levine <johnl@taugh.com> writes:

According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:

I'll hazard a guess that once you've got the indirect bit out in memory, >>it's easier to just use the same logic on all memory reads than to only
let it happen once.

That's not how indirect addressing worked.

There was always a bit in the instruction to say to do indirection.

In our case (B3500 et alia), there was a bit per operand, so a three operand instruction could have all three addresses indirect. The processor treated
the value at the indirect address as an operand address allowing infinite recursion (subject to a processor timer in case of loops).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Quadibloc on Sun Dec 31 18:28:07 2023

Quadibloc <quadibloc@servername.invalid> writes:

On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the word
is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I
understand the descriptions of this feature.

The major question I have is why these architectures have this feature.

No doubt this answer has already been given.

The reason these architectures had that feature was because of a feature
they _didn't_ have: an index register.

Not necessarily true. The B3500 had three index registers (special
locations in memory, not real registers). Later systems in the early
80's added an additional four register-based index registers, but
continued to support indirect addressing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Paul A. Clayton on Sun Dec 31 18:57:21 2023

Paul A. Clayton wrote:

On 12/29/23 2:36 PM, John Levine wrote:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found.

[snip]

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

[snip]

As you suggested, a lot of uses boiled down to providing a fixed
address for something that can move, so instructions could indirect
through that fixed address without having to load it into a register.

Paged virtual memory as commonly implemented introduces one level
of indirection at page (rather than word) granularity.
Virtualization systems using nested page tables introduce a second
direction.

Hierarchical/multi-level page tables have multiple layers of
indirection where instead of a page table base pointer pointing to
a complete page table it points to a typically-page-sized array of
address and metadata entries where each entry points to a similar
array eventually reaching the PTE.

Even with page table caching (and workloads that play well with
this kind of virtual memory), this is not free but it can be
"cheap enough". Using large pages for virtual-physical to physical translation can help a lot. Presumably having an OS bias placement
of its translation table pages into large quasi-pages would help
caching for VPA-to-PA, i.e., many VPAs used by the OS for paging
would be in the same large page (e.g., 2MiB for x86).

(Andy Glew had suggested using larger pages for intermediate nodes
rather than limiting such to the last node in a hierarchical page
table.

I had been thinking that since my large-page translation tables have
a count of the number of pages, that when forking off a new GuestOS
that I would allocate the HyperVisor tables as a single 8GB large
page, and when it needs more then switch to a more treeified page
table. This leaves the second level of DRAM translation at 1 very
cacheable and TLB-able PTE--dramatically reducing the table walking
overhead.

A single 8GB page mapping can allow access to one 8192B page up to
1M 8192B pages. Guest OS page tables can map any of these 8192B pages
to any virtual address it desires with permissions it desires.

This has the same level-reducing effect of huge pages that short-circuit the translation indirection at the end but allows
eviction and permission control at base-page size, with the
consequent larger number of PTEs active if there is spatial
locality at huge page granularity. Such merely assumes that
locality potentially exists at the intermediate nodes rather than
exclusively at the last node. Interestingly, with such a page
table design one might consider having rather small pages; e.g., a
perhaps insane 64-byte base page size (at least for the tables)
would only provide 3 bits per level but each level could be
flattened to provide 6, 9, 12, etc. bits. Such extreme flexibility
may well not make sense, but it seems interesting to me.)

For most purposes, index registers do indirection better, and now that
everything has a lot of registers, you can use some of them for the
fixed->movable stuff like the GOT in Unix/linux shared libraries.

For x86-64 some of the segments can have non-zero bases, so these
provide an additional index register ("indirection").

This has more to do with 16 registers being insufficient than indirection (segmentation) being better.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Thomas Koenig on Sun Dec 31 18:59:45 2023

Thomas Koenig wrote:

MitchAlsup <mitchalsup@aol.com> schrieb:

Quadibloc wrote:

On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the word >>>> is set, the CPU uses that word as the address to access, and this
repeats until a word without this bit is found. At least that's how I >>>> understand the descriptions of this feature.

The major question I have is why these architectures have this feature.

No doubt this answer has already been given.

The reason these architectures had that feature was because of a feature >>> they _didn't_ have: an index register.

This is a better explanation than above. Instead of paying the high price
needed for index registers, they use main memory as their index registers. >> {{A lot like building linked lists in FORTRAN 66}}.

The PDP-10 had both a recursive indirect bit and index registers (aka
memory locations 1 to 15), if I remember the manuals correctly
(I did a bit of reading, but I've never even come close to one of
these machines).

All of the PDP-10s at CMU had the register upgrade. {2×Ki and 1×Kl}
I believe that most PDP-10 ever sold had the register upgrade.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sun Dec 31 20:19:32 2023

According to Thomas Koenig <tkoenig@netcologne.de>:

The PDP-10 had both a recursive indirect bit and index registers (aka
memory locations 1 to 15), if I remember the manuals correctly
(I did a bit of reading, but I've never even come close to one of
these machines).

Yup. Each instruction had an 18 bit address, a four bit index register, and an indirect bit.
It took the address, and added the contents of the right half of the index register if non-zero.
If the indirect bit was off, that was the operand address. If the indirect bit was set, it
fetched the word at that location and did the whole thing over again, including the indexing.

You could in principle create extremely complicated address chanis but
it was so confusing that nobody did.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to John Levine on Sun Dec 31 20:42:42 2023

John Levine wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

The PDP-10 had both a recursive indirect bit and index registers (aka >>memory locations 1 to 15), if I remember the manuals correctly
(I did a bit of reading, but I've never even come close to one of
these machines).

Yup. Each instruction had an 18 bit address, a four bit index register, and an indirect bit.
It took the address, and added the contents of the right half of the index register if non-zero.
If the indirect bit was off, that was the operand address. If the indirect bit was set, it
fetched the word at that location and did the whole thing over again, including the indexing.

You could in principle create extremely complicated address chanis but
it was so confusing that nobody did.

At CMU is used this a lot for things like symbol table searches.
What I did not use was the index register stuff of the indirection (except
at the first level).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sarr.blumson@alum.dartmouth.org@21:1/5 to John Levine on Mon Jan 1 20:31:28 2024

John Levine <johnl@taugh.com> wrote:

: More importantly, index registers are a lot faster than indirect
: addressing and at least since the IBM 801, we have good algorithms to
: do register scheduling.

Once upon a time saving an instruction was a big deal; the 801, and
RISC in general, was possible because memory got much cheaper.
Using index registers costs an extra instrucion for loading the index
register.

Index registers were a scarce resource too (except for the Atlas) so
keeping all your pointers in index registers wasn't a good option
either.

sarr`

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to sarr.blumson@alum.dartmouth.org on Thu Jan 4 01:36:35 2024

sarr.blumson@alum.dartmouth.org wrote:

John Levine <johnl@taugh.com> wrote:

: More importantly, index registers are a lot faster than indirect
: addressing and at least since the IBM 801, we have good algorithms to
: do register scheduling.

Once upon a time saving an instruction was a big deal; the 801, and
RISC in general, was possible because memory got much cheaper.
Using index registers costs an extra instrucion for loading the index register.

Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions
as VAX and at 6× the frequency for a 4× improvement in performance.

Now, imagine a RISC ISA that only needs 1.1× as many instructions as
VAX with no degradation WRT operating frequency.

Index registers were a scarce resource too (except for the Atlas) so
keeping all your pointers in index registers wasn't a good option
either.

sarr`

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Fri Jan 5 11:33:40 2024

Anton Ertl wrote:

Some (many?) architectures of the 1960s (earlier? later?) have the
feature that, when loading from an address, if a certain bit in the
word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

The major question I have is why these architectures have this
feature.

The only use I can come up with for the arbitrarily repeated
indirection is the implementation of logic variables in Prolog.
However, Prolog was first implemented in 1970, and it did not become a
big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

A use for a single indirection is the implementation of the memory
management in the original MacOS: Each dynamically allocated memory
block was referenced only from a single place (its handle), so that
the block could be easily relocated. Only the address of the handle
was freely passed around, and accessing the block then always required
double indirection. MacOS was implemented on the 68000, which did not
have the indirect bit; this demonstrates that the indirect bit is not necessary for that. Nevertheless, such a usage pattern might be seen
as a reason to add the indirect bit. But is it enough?

Were there any other usage patterns? What happened to them when the
indirect bit went out of fashion?

One other question is how the indirect bit works with stores. How do
you change the first word in the chain, the last one, or any word in
between?

- anton

PDP-11 and VAX had multiple address modes with a single level of indirection. The VAX usage stats from 1984 show about 3% use on SPEC.

DG Nova had infinite indirection - if the Indirect bits was set in the instruction then in the address register if the msb of the address was zero then it was the address of the 16-bit data, if the msb of the address was 1 then it was the address of another address, looping until msb = 0.
I don't know how DG used it but, just guessing, because Nova only had
4 registers might be to create a kind of virtual register set in memory.

The best use I have for single level indirection is compilers & linkers.
The compiler emits a variable reference without knowing if it is local
to the linkage unit or imported from a DLL. Linker discovers it is a
DLL export variable and changes the assigned variable to be a pointer
to the imported value that is patched by the loader,
and just flips the Indirect bit on the instruction.

Doing the same thing without address indirection requires inserting
extra LD instructions and having a spare register allocated to the
linker to work with.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Jan 5 18:05:21 2024

According to EricP <ThatWouldBeTelling@thevillage.com>:

PDP-11 and VAX had multiple address modes with a single level of indirection. >The VAX usage stats from 1984 show about 3% use on SPEC.

The main place the PDP-11 used indirect addressing was in @(PC)+ which
was the idiom for absolute addressing. It fetched the next word in the instruction stream as an immediate via (PC)+ and then used it as an
address via indirection. The assembler let you write @#123 to geerate
that address mode and put the 123 in line.

It was also useful for threaded code, where you had a register,
typically R4, pointing at a list of routine addresses and dispatched
with JMP @(R4)+

If you were feeling clever you could do this coroutine switch JSR PC,@(SP)+

That popped the top word off the stack, then pushed the current PC, then jumped to the address it had popped.

DG Nova had infinite indirection - if the Indirect bits was set in the >instruction then in the address register if the msb of the address was zero >then it was the address of the 16-bit data, if the msb of the address was 1 >then it was the address of another address, looping until msb = 0.
I don't know how DG used it but, just guessing, because Nova only had
4 registers might be to create a kind of virtual register set in memory.

My guess is that it was cheap to implement and let them say look, here
is a cool thing that we do and DEC doesn't. I would be surprised if
there were many long indirect chains.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to John Levine on Fri Jan 5 23:20:45 2024

John Levine wrote:

According to EricP <ThatWouldBeTelling@thevillage.com>:

PDP-11 and VAX had multiple address modes with a single level of indirection. >>The VAX usage stats from 1984 show about 3% use on SPEC.

The main place the PDP-11 used indirect addressing was in @(PC)+ which
was the idiom for absolute addressing. It fetched the next word in the instruction stream as an immediate via (PC)+ and then used it as an
address via indirection. The assembler let you write @#123 to geerate
that address mode and put the 123 in line.

It was also useful for threaded code, where you had a register,
typically R4, pointing at a list of routine addresses and dispatched
with JMP @(R4)+

If you were feeling clever you could do this coroutine switch JSR PC,@(SP)+

That popped the top word off the stack, then pushed the current PC, then jumped
to the address it had popped.

I used this in a real-timeOS I developed at CMU to deal with laser power control.

Processes (no MMU or protection) would receive control JSR PC,@(SP)+ and
return control with JSR PC,@(SP)+ at which time OS would find the next thing to do and JSR PC,@(SP)+ all over again. Really light weight context switching.

DG Nova had infinite indirection - if the Indirect bits was set in the >>instruction then in the address register if the msb of the address was zero >>then it was the address of the 16-bit data, if the msb of the address was 1 >>then it was the address of another address, looping until msb = 0.
I don't know how DG used it but, just guessing, because Nova only had
4 registers might be to create a kind of virtual register set in memory.

My guess is that it was cheap to implement and let them say look, here
is a cool thing that we do and DEC doesn't. I would be surprised if
there were many long indirect chains.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jan 17 05:28:46 2024

On Fri, 29 Dec 2023 19:04:56 GMT, Scott Lurndal wrote:

The [Burroughs] system ran mostly COBOL code (with some BPL;
assemblers weren't generally provided to customers).

For an interesting reason: privilege protection was enforced in software,
not hardware.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 08:14:34 2024

Lawrence D'Oliveiro wrote:

On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
VAX and at 6× the frequency for a 4× improvement in performance.

Mmm, maybe you got the last two multipliers the wrong way round?

No, that seems correct: It needed 1.5 times as many instructions, so the
6X frequency must be divided by 1.5 for a final speedup of 4X?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Wed Jan 17 06:36:41 2024

On Sun, 31 Dec 2023 17:54:44 -0000 (UTC), Thomas Koenig wrote:

... but I've never even come close to one of these
machines).

You could have one, or a software emulation of one, right in front of you,
just a SIMH install away.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to MitchAlsup on Wed Jan 17 06:34:55 2024

On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
VAX and at 6× the frequency for a 4× improvement in performance.

Mmm, maybe you got the last two multipliers the wrong way round?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 16:02:32 2024

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

On Fri, 29 Dec 2023 19:04:56 GMT, Scott Lurndal wrote:

[no assembler shipped to customers]

The [Burroughs] system ran mostly COBOL code (with some BPL;
assemblers weren't generally provided to customers).

For an interesting reason: privilege protection was enforced in software,
not hardware.

Actually, that is not the case.

Burroughs had multiple lines of mainframes: small, medium and large.

Small systems (b1700/b1800/b1900) had a writeable control store and the instruction set
would be dynamically loaded when the application was scheduled.

Medium systems were BCD systems (B[234][5789]xx) (descended from the orignal line
of Electrodata Datatron systems when Burroughs bought electrodata
in the mid 1950s). Designed to efficiently run COBOL code. These
are the systems I was referring to above. They had hardware
enforced privilege protection.

Large systems (starting with the B5000/B5500) were stack systems
running ALGOL and algol deriviative (DCALGOL, NEWP, etc)
(they also supported COBOL, Fortran, Basic, etc).

The systems you are thinking about were the Large systems. And
there were issues with that (a famous paper in the mid 1970s
showed how to set the 'compiler' flag on any application allowing
it to bypass security protections - put the application on a
tape, load it on an IBM system, patch the executable header,
and restore it on the Burroughs system).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 17:38:51 2024

Lawrence D'Oliveiro wrote:

On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
VAX and at 6× the frequency for a 4× improvement in performance.

Mmm, maybe you got the last two multipliers the wrong way round?

Performance is in millions of instructions per second.

If the instruction count was 1.0× a 6× frequency would yield 6× gain.

So, since there were 1.5× as many instructions and 6× as many instructions per
second, 6 / 1.5 = 4×

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 13:09:52 2024

Lawrence D'Oliveiro wrote:

On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
VAX and at 6× the frequency for a 4× improvement in performance.

Mmm, maybe you got the last two multipliers the wrong way round?

VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
machine and advertised it as such. But no one had actually measured it.
When they finally did and found it was 0.5 MIPS they just changed to
calling that "1 VUP" or "VAX-780 Units of Processing".

This also showed up in the Dhrystone benchmarks:

https://en.wikipedia.org/wiki/Dhrystone_Results

"Another common representation of the Dhrystone benchmark is the
DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided
by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine)."

I suppose they should have changed that to DVUPS.

Stanford MIPS (16 registers) in 1984 ran at 4 MHz with a 5 stage pipeline.
The paper I'm looking at compares it to a 8 MHz 68000 and has
Stanford MIPS averaging 5 times faster on their Pascal benchmark.

The MIPS R2000 with 32 registers launched in 1986 at 8.3, 12.5 and 15 MHz.
It supposedly could sustain 1 reg-reg ALU operation per clock.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to ThatWouldBeTelling@thevillage.com on Wed Jan 17 19:14:00 2024

It appears that EricP <ThatWouldBeTelling@thevillage.com> said:

VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
machine and advertised it as such.

No, they knew how fast it was. It was about as fast as an IBM 370/158
which IBM rated at 1 MIPS. A Vax instruction could do a lot more than
a 370 instruction so it wasn't implausible that the performance was
similar even though the instruction rate was about half.

When they finally did and found it was 0.5 MIPS they just changed to
calling that "1 VUP" or "VAX-780 Units of Processing".

Yeah, they got grief for the MIPS stuff.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to John Levine on Wed Jan 17 14:32:45 2024

John Levine wrote:

It appears that EricP <ThatWouldBeTelling@thevillage.com> said:

VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
machine and advertised it as such.

No, they knew how fast it was. It was about as fast as an IBM 370/158
which IBM rated at 1 MIPS.

So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

A Vax instruction could do a lot more than
a 370 instruction so it wasn't implausible that the performance was
similar even though the instruction rate was about half.

And they define 1 VUP = 1 TOUP

When they finally did and found it was 0.5 MIPS they just changed to
calling that "1 VUP" or "VAX-780 Units of Processing".

Yeah, they got grief for the MIPS stuff.

One just has to be careful comparing clock MIPS and VUPS.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jan 17 19:55:11 2024

According to EricP <ThatWouldBeTelling@thevillage.com>:

John Levine wrote:

It appears that EricP <ThatWouldBeTelling@thevillage.com> said:

VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
machine and advertised it as such.

No, they knew how fast it was. It was about as fast as an IBM 370/158
which IBM rated at 1 MIPS.

So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

If you want. IBM mainframe MIPS was a well understood performance
measure at the time. In the mid 1970s, there were a few IBM clones
like Amdahl, but the other mainframe makers were already sinking into obscurity. I can't think of anyone else making a 32 bit byte
addressable mainframe at the time that wasn't an IBM clone. I suppose
there were the Interdata machines but they were minis and sold mostly
for embedded realtime.

A Vax instruction could do a lot more than
a 370 instruction so it wasn't implausible that the performance was
similar even though the instruction rate was about half.

And they define 1 VUP = 1 TOUP

Yes, but a TOUP really was an IBM MIPS.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to John Levine on Wed Jan 17 15:19:05 2024

John Levine wrote:

According to EricP <ThatWouldBeTelling@thevillage.com>:

John Levine wrote:

It appears that EricP <ThatWouldBeTelling@thevillage.com> said:

VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction >>>> giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
machine and advertised it as such.

No, they knew how fast it was. It was about as fast as an IBM 370/158
which IBM rated at 1 MIPS.

So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

If you want. IBM mainframe MIPS was a well understood performance
measure at the time. In the mid 1970s, there were a few IBM clones
like Amdahl, but the other mainframe makers were already sinking into obscurity. I can't think of anyone else making a 32 bit byte
addressable mainframe at the time that wasn't an IBM clone. I suppose
there were the Interdata machines but they were minis and sold mostly
for embedded realtime.

A Vax instruction could do a lot more than
a 370 instruction so it wasn't implausible that the performance was
similar even though the instruction rate was about half.

And they define 1 VUP = 1 TOUP

Yes, but a TOUP really was an IBM MIPS.

Ok, but VAX-780 really was measured by DEC at 0.5 MIPS.
So either the assumption that a VUP = TOUP was wrong
or the assumption that a TOUP = MIPS was.

See section 5 and table 8.

Characterization of Processor Performance in the VAX-11/780, 1984 http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jan 17 20:27:45 2024

According to EricP <ThatWouldBeTelling@thevillage.com>:

No, they knew how fast it was. It was about as fast as an IBM 370/158 >>>> which IBM rated at 1 MIPS.

So those were TOUPS or Three-seventy One-fifty-eight Units of Performance. >>

If you want. IBM mainframe MIPS was a well understood performance
measure at the time. In the mid 1970s, there were a few IBM clones
like Amdahl, but the other mainframe makers were already sinking into
obscurity. I can't think of anyone else making a 32 bit byte
addressable mainframe at the time that wasn't an IBM clone. I suppose
there were the Interdata machines but they were minis and sold mostly
for embedded realtime.

A Vax instruction could do a lot more than
a 370 instruction so it wasn't implausible that the performance was
similar even though the instruction rate was about half.

And they define 1 VUP = 1 TOUP

Yes, but a TOUP really was an IBM MIPS.

Ok, but VAX-780 really was measured by DEC at 0.5 MIPS.
So either the assumption that a VUP = TOUP was wrong
or the assumption that a TOUP = MIPS was.

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions. In that era, MIPS meant
either a million IBM instructions, or as some wag put it, Meaningless Indication of Processor Speed.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Thomas Koenig on Wed Jan 17 18:14:06 2024

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

I was thinking the same thing. VAX address modes like auto-increment
would be equivalent to 2 instructions for each operand and likely used
in benchmarks.

VAX having 32-bit immediates and offsets and and 64-bit float immediates
per operand vs 370 having to build constants or load them.

And POLY for transcendentals is one instruction.

All of those would add clocks to the VAX instruction execute time
but not its instruction count and MIPS.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Wed Jan 17 22:37:00 2024

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to EricP on Wed Jan 17 23:56:24 2024

EricP <ThatWouldBeTelling@thevillage.com> writes:

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

I was thinking the same thing. VAX address modes like auto-increment
would be equivalent to 2 instructions for each operand and likely used
in benchmarks.

MOVC3 and MOVC5, perhaps?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Wed Jan 17 23:55:45 2024

Thomas Koenig wrote:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

Fancy addressing modes {indirection, pre decrement, post increment,
Constants, Displacements, index, ADD-CMP-
Branch, CRC, Bit manipulation, ...)
You could say these contribute to most of the gain

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Thu Jan 18 00:24:38 2024

On Wed, 17 Jan 2024 23:56:24 GMT, Scott Lurndal wrote:

MOVC3 and MOVC5, perhaps?

Interruptible instructions ... wot fun ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to tkoenig@netcologne.de on Thu Jan 18 02:05:30 2024

It appears that Thomas Koenig <tkoenig@netcologne.de> said:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

I don't think anyone used the fancy addressing modes or complex
instructions much. But here's an example. Let's say A, B, and C are
floats in addressable memory and you want to do A = B + C

370 code

LE R0,B
AE R0,C
STE R0,A

VAX code

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than 3.

Or that old Fortran favorite I = I + 1

370 code

L R1,I
LA R2,1
AR R1,R2
ST R1,I

VAX code

INCL I

or if you have a lousy optimizer

ADDL2 #1,I

or if you have a really lousy optimizer

ADDL3 #1,I,I

It's still one instruction rather than four.

In 370 code you often also needed extra instructions to make data
addressable since it had no direct addressing and address offsets in instructions were only 12 bits.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Thu Jan 18 04:51:09 2024

On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than 3.

If those were register operands, that instruction would be 4 bytes.

I think worst case, each operand could have an index register and a 4-byte offset (in addition to the operand specifier byte), for a maximum
instruction length of 19 bytes.

So, saying “just one instruction” may not sound as good as you think.

Here’s an old example, from the VMS kernel itself. This instruction

PUSHR #^M<R0,R1,R2,R3,R4,R5>

pushes the first 6 registers onto the stack, and occupies just 2 bytes.
Whereas this sequence

PUSHL R5
PUSHL R4
PUSHL R3
PUSHL R2
PUSHL R1
PUSHL R0

does the equivalent thing, but takes up 2 × 6 = 12 bytes.

Guess which is faster?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Thu Jan 18 10:08:47 2024

Lawrence D'Oliveiro wrote:

On Wed, 17 Jan 2024 23:56:24 GMT, Scott Lurndal wrote:

MOVC3 and MOVC5, perhaps?

Interruptible instructions ... wot fun ...

REP MOVS is the classic x86 example: Since all register usage is fixed (r)si,(r)di,(r)cx the cpu can always accept an interrupt at any point,
it just needs to update those three registers and take the interrupt.

When the instruction resumes, any remaining moves are performed.

This was actually an early 8086/8088 bug: If you had multiple prefix
bytes, like you would need if you were moving data to the Stack segment
instead of the Extra, and the encoding was REP SEGSS MOVS, then only hte
last prefix byte was remembered in the saved IP/PC value.

I used to check for this bug by moving a block which was large enough
that it took over 55ms, so that a timer interrupt was guaranteed:

If the CX value wasn't zero after the instruction, then the bug had
happened.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Lawrence D'Oliveiro on Thu Jan 18 09:39:55 2024

Lawrence D'Oliveiro wrote:

On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than 3.

If those were register operands, that instruction would be 4 bytes.

I think worst case, each operand could have an index register and a 4-byte offset (in addition to the operand specifier byte), for a maximum
instruction length of 19 bytes.

The longest instruction I think might be an ADD3H with two H format
16-byte float immediates with an indexed destination with 4 byte offset.

That should be something like 2 opcode, 1 opspec, 16 imm,
1 opspec, 16 imm, 1 opspec, 4 imm, 1 index = 42 bytes.

(Yes its a silly instruction but legal.)

So, saying “just one instruction” may not sound as good as you think.

Here’s an old example, from the VMS kernel itself. This instruction

PUSHR #^M<R0,R1,R2,R3,R4,R5>

pushes the first 6 registers onto the stack, and occupies just 2 bytes. Whereas this sequence

PUSHL R5
PUSHL R4
PUSHL R3
PUSHL R2
PUSHL R1
PUSHL R0

does the equivalent thing, but takes up 2 × 6 = 12 bytes.

Guess which is faster?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to John Levine on Thu Jan 18 09:19:44 2024

John Levine wrote:

It appears that Thomas Koenig <tkoenig@netcologne.de> said:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

I don't think anyone used the fancy addressing modes or complex
instructions much. But here's an example. Let's say A, B, and C are
floats in addressable memory and you want to do A = B + C

370 code

LE R0,B
AE R0,C
STE R0,A

VAX code

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than 3.

Or that old Fortran favorite I = I + 1

370 code

L R1,I
LA R2,1
AR R1,R2
ST R1,I

VAX code

INCL I

or if you have a lousy optimizer

ADDL2 #1,I

or if you have a really lousy optimizer

ADDL3 #1,I,I

It's still one instruction rather than four.

In 370 code you often also needed extra instructions to make data
addressable since it had no direct addressing and address offsets in instructions were only 12 bits.

VAX Fortran77 could optimize a DO loop array index to an autoincrement,
I think they called it strength reduction of loop induction variables.

do i = 1, N
A(i) = A(i) + B(i)
end do

ADDD (rB)+, (rA)+

VAX usage stats for compilers Basic, Bliss, Cobol, Fortran, Pascal, PL1,
show usage frequency per operand specifier of autoincrement ~4%, index ~7% except Basic has 17% for autoincrement.
There is almost no usage of deferred addressing (address of address of data).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to EricP on Thu Jan 18 16:24:13 2024

EricP <ThatWouldBeTelling@thevillage.com> writes:

The MIPS R2000 with 32 registers launched in 1986 at 8.3, 12.5 and 15 MHz.
It supposedly could sustain 1 reg-reg ALU operation per clock.

It could do at most one instruction per clock, and it certainly needed
to branch at some point, so no sustained 1/clock ALU instructions.
Also, a useful program would want to load or store at some point, so
even less ALU instructions. And with cache misses, also fewer than 1
IPC.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Jan 18 16:31:29 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than 3.

If those were register operands, that instruction would be 4 bytes.

I think worst case, each operand could have an index register and a 4-byte >offset (in addition to the operand specifier byte), for a maximum
instruction length of 19 bytes.

So, saying “just one instruction” may not sound as good as you think.

I wasn't saying they were always better, just pointing out that there
were straightforward reasons that 500K VAX instructions could do the
same work as 1M 370 instructions.

Considering that the 370 is still alive and the VAX died decades ago,
it should be evident that instruction count isn't a very useful
metric across architectures.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Thomas Koenig on Thu Jan 18 16:35:45 2024

Thomas Koenig <tkoenig@netcologne.de> writes:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

It's interesting that these are the features you are thinking of,
especially because the IBM 801 research and the RISC research showed
that fancy addressing modes are rarely used. Table 4 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> shows that addressing modes that the S/360 or even MIPS does not
support are quite rare:

%
Auto-inc. (R)+ 2.1
Disp. Deferred @D(R) 2.7
Absolute @(PC) 0.6
Auto-inc.def. @(R)+ 0.3
Auto-dec. -(R) 0.9

for a total of 6.6% of the operand specifiers; there are about 1.5
operand specifiers per instruction (Table 3), so that's ~0.1 operand
specifier with a fancy addressing mode per instruction.

Back to why S/360 has more instructions than VAX, John Levine gave a
good answer.

One aspect (partially addressed by John Levine, but not discussed
explicitly) is that the VAX is a three-address machine, while S/360 is
a two-address machine, so the S/360 occasionally needs reg-reg moves
where VAX does not. Plus, S/360 usually requires one of its two
operands to be a register, so in some cases an additional load is
necessary on the S/360 that is not needed on the VAX.

Among the complex VAX instructions CALL/RET and multi-register push
and pop constiture 3.22% of the instructions according to Table 1 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
I expect that these correspond to multiple instructions on the S/360.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to Anton Ertl on Thu Jan 18 13:07:52 2024

Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

John Levine <johnl@taugh.com> schrieb:

As I think I said above. a million IBM instructions did about as much
work as half a million VAX instructions.

Why the big difference? Were fancy addressing modes really used so
much? Or did the code for the VAX mostly run POLY instructions? :-)

It's interesting that these are the features you are thinking of,
especially because the IBM 801 research and the RISC research showed
that fancy addressing modes are rarely used. Table 4 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> shows that addressing modes that the S/360 or even MIPS does not
support are quite rare:

%
Auto-inc. (R)+ 2.1
Disp. Deferred @D(R) 2.7
Absolute @(PC) 0.6
Auto-inc.def. @(R)+ 0.3
Auto-dec. -(R) 0.9

for a total of 6.6% of the operand specifiers; there are about 1.5
operand specifiers per instruction (Table 3), so that's ~0.1 operand specifier with a fancy addressing mode per instruction.

Back to why S/360 has more instructions than VAX, John Levine gave a
good answer.

One aspect (partially addressed by John Levine, but not discussed
explicitly) is that the VAX is a three-address machine, while S/360 is
a two-address machine, so the S/360 occasionally needs reg-reg moves
where VAX does not. Plus, S/360 usually requires one of its two
operands to be a register, so in some cases an additional load is
necessary on the S/360 that is not needed on the VAX.

Among the complex VAX instructions CALL/RET and multi-register push
and pop constiture 3.22% of the instructions according to Table 1 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
I expect that these correspond to multiple instructions on the S/360.

- anton

There is also a different paper with slightly different stats that,
amonst other things, shows address mode usage by compiled language.

A Case Study of VAX-11 Instruction Set Usage For Compiler Execution
Wiecek, 1982
https://dl.acm.org/doi/pdf/10.1145/960120.801841

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Jan 18 19:08:47 2024

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>: ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >shows that addressing modes that the S/360 or even MIPS does not

support are quite rare:

%
Auto-inc. (R)+ 2.1
Disp. Deferred @D(R) 2.7
Absolute @(PC) 0.6
Auto-inc.def. @(R)+ 0.3
Auto-dec. -(R) 0.9

That's not entirely fair. The VAX has an immediate address mode that
could encode constant values from 0 to 63. Both papers said it was
about 15% so it was definitely a success. The 370 had sort of a split personality, a shotgun marriage of a register scientific machine
and a memory-to-memory commercial machine. There were a bunch of
instructions with immediate operands but they all were a one byte
immediate and a memory location. Hence the extra LA instructions to
get immediates into registers.

Both papers said the index mode, which added a scaled register to an
address computed any other way, was about 6% which was higher than I
would have expected. The 370 has a similar base+displacement+index
which I hear is almost never used.

Among the complex VAX instructions CALL/RET and multi-register push
and pop constiture 3.22% of the instructions according to Table 1 of ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
I expect that these correspond to multiple instructions on the S/360.

The VAX had an all singing and dancing CALLS/RET that saved registers
and set up a stack frame. and a simple JSB/RSB that just pushed the
return address and jumped. CALLS was extremely slow and did far
more than was usually needed so for the most part it was only used
for inter-module calls that had to use the official calling sequence,
and JSB for everything else.

The VAX instruction set was overoptimized for code size and a
simplistic idea of easy programming which meant among other things
that a fancy instruction was often slower than the equivalent sequence
of simple instructions, and a lot of the fancy instructions weren't
used very much.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Thu Jan 18 20:56:08 2024

On Thu, 18 Jan 2024 16:24:13 GMT, Anton Ertl wrote:

[MIPS] could do at most one instruction per clock, and it certainly
needed to branch at some point, so no sustained 1/clock ALU
instructions.

But it also had delayed branches, so perhaps it could sustain that rate
across a taken branch?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Thu Jan 18 20:55:15 2024

On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

The 370 had sort of a split personality, a shotgun marriage of a
register scientific machine and a memory-to-memory commercial machine.

That pins things down quite narrowly as to when it came into being,
doesn’t it? Up to about that point, “scientific” and “business” computing
were considered to be separate worlds, needing their own hardware and
software, and never the twain shall meet.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to EricP on Thu Jan 18 21:01:19 2024

On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

do i = 1, N
A(i) = A(i) + B(i)
end do

ADDD (rB)+, (rA)+

... set up rA, rB, rI ...
BRB $9000
$1000:
ADDD (rB)+, (rA)+
$9000:
SOBGEQ rI, $1000

Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
0, the loop body never executes at all.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Jan 18 22:16:11 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

The 370 had sort of a split personality, a shotgun marriage of a
register scientific machine and a memory-to-memory commercial machine.

That pins things down quite narrowly as to when it came into being,
doesn’t it? Up to about that point, “scientific” and “business” computing
were considered to be separate worlds, needing their own hardware and >software, and never the twain shall meet.

Yes, the whole point of S/360 was to produce a unified architecture that IBM could sell to all of their customers.

It may have been a shotgun marriage, but it's been a very long lasting one.

You can still run most S/360 application code unmodified on the latest zSeries.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Thu Jan 18 22:19:04 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

do i = 1, N
A(i) = A(i) + B(i)
end do

ADDD (rB)+, (rA)+

... set up rA, rB, rI ...
BRB $9000
$1000:
ADDD (rB)+, (rA)+
$9000:
SOBGEQ rI, $1000

Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
0, the loop body never executes at all.

Ah, that must have been a Fortran 77 or later DO loop. In Fortran 66 the
loop usually ran once regardless.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Fri Jan 19 07:28:57 2024

Lawrence D'Oliveiro wrote:

On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

do i = 1, N
A(i) = A(i) + B(i)
end do

ADDD (rB)+, (rA)+

... set up rA, rB, rI ...
BRB $9000
$1000:
ADDD (rB)+, (rA)+
$9000:
SOBGEQ rI, $1000

Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
0, the loop body never executes at all.

This is the kind of tiny loop body where I would have considered
replacing the initial BRB $9000 with a dummy instruction (like a compare
reg with immediate) where the immediate value contained the ADDD loop body.

This assumes of course that such a dummy opcode would (on average) be
faster than a taken forward branch!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Fri Jan 19 08:39:16 2024

John Levine <johnl@taugh.com> writes:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>: >><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >>shows that addressing modes that the S/360 or even MIPS does not

support are quite rare:

%
Auto-inc. (R)+ 2.1
Disp. Deferred @D(R) 2.7
Absolute @(PC) 0.6
Auto-inc.def. @(R)+ 0.3
Auto-dec. -(R) 0.9

That's not entirely fair. The VAX has an immediate address mode that
could encode constant values from 0 to 63. Both papers said it was
about 15% so it was definitely a success. The 370 had sort of a split >personality, a shotgun marriage of a register scientific machine
and a memory-to-memory commercial machine. There were a bunch of >instructions with immediate operands but they all were a one byte
immediate and a memory location. Hence the extra LA instructions to
get immediates into registers.

So this advantage of the VAX over S/360 was not a "fancy" addressing
mode, but the immediate addressing mode that S/360 does not have, but
that all RISCs have, even MIPS, Alpha and RISC-V (except that these architectures define addi/addiu as separate instructions). VAX has
"short literal", as you explain (15.8% of the operands) as well as
"immediate" (2.4% of the operands). With 1.5 operands per
instruction, that alone is a factor 1.27 more instructions for S/360
than for VAX.

Among the complex VAX instructions CALL/RET and multi-register push
and pop constiture 3.22% of the instructions according to Table 1 of >><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >>I expect that these correspond to multiple instructions on the S/360.

The VAX had an all singing and dancing CALLS/RET that saved registers
and set up a stack frame. and a simple JSB/RSB that just pushed the
return address and jumped. CALLS was extremely slow and did far
more than was usually needed so for the most part it was only used
for inter-module calls that had to use the official calling sequence,
and JSB for everything else.

That probably depends on the compiler. Table 2 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> lists 4.5% "subroutine call and return", and 2.4% "procedure call and
return"; I assume the latter is the all-singing all-dancing CALL and
RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
is probably the multi-register push and pop instructions.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to John Levine on Fri Jan 19 16:23:09 2024

On Thu, 18 Jan 2024 16:31:29 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

ADDF3 B,C,A

The VAX may not be faster, but that's one instruction rather than
3.

If those were register operands, that instruction would be 4 bytes.

I think worst case, each operand could have an index register and a
4-byte offset (in addition to the operand specifier byte), for a
maximum instruction length of 19 bytes.

So, saying â€œjust one instructionâ€_ may not sound as good as you >think.

I wasn't saying they were always better, just pointing out that there
were straightforward reasons that 500K VAX instructions could do the
same work as 1M 370 instructions.

Considering that the 370 is still alive and the VAX died decades ago,
it should be evident that instruction count isn't a very useful
metric across architectures.

That's not totally fair.
S/360 permanently reinvents itself. VAX could have done the same, but voluntarily refused.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to John Levine on Fri Jan 19 16:40:19 2024

On Thu, 18 Jan 2024 22:16:11 -0000 (UTC)
John Levine <johnl@taugh.com> wrote:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

The 370 had sort of a split personality, a shotgun marriage of a
register scientific machine and a memory-to-memory commercial
machine.

That pins things down quite narrowly as to when it came into being, >doesnâ€™t it? Up to about that point, â€œscientificâ€_ and >â€œbusinessâ€_ computing were considered to be separate worlds, >needing their own hardware and software, and never the twain shall
meet.

Yes, the whole point of S/360 was to produce a unified architecture
that IBM could sell to all of their customers.

It may have been a shotgun marriage, but it's been a very long
lasting one.

Was it?
Being younger observer from the outside, my impression is that in the
1st World people stopped using S/360 descendents for "heavy" scientific calculations around 1980. In other parts of the World it lasted few
years longer, but still no longer than 1990. Use of IBM manframes for
CAD continued well into 90s and may be even into this century, but CAD
is not what people called "scientific computing" back when S/360 was
conceived.

You can still run most S/360 application code unmodified on the
latest zSeries.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Jan 19 16:59:33 2024

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

So this advantage of the VAX over S/360 was not a "fancy" addressing
mode, but the immediate addressing mode that S/360 does not have, but
that all RISCs have, even MIPS, Alpha and RISC-V (except that these >architectures define addi/addiu as separate instructions). VAX has
"short literal", as you explain (15.8% of the operands) as well as >"immediate" (2.4% of the operands). With 1.5 operands per
instruction, that alone is a factor 1.27 more instructions for S/360
than for VAX.

Looks that way. IBM apparently noticed it too since S/390 added 16 bit immediate load, compare, add, subtract, and multiply, and zSeries
added immediate everything, such as add immediate to memory.

That probably depends on the compiler. Table 2 of ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >lists 4.5% "subroutine call and return", and 2.4% "procedure call and >return"; I assume the latter is the all-singing all-dancing CALL and
RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
is probably the multi-register push and pop instructions.

Sounds right. I'm surprised the procedure call numbers were so high.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Michael S on Fri Jan 19 16:22:22 2024

Michael S <already5chosen@yahoo.com> writes:

Being younger observer from the outside, my impression is that in the
1st World people stopped using S/360 descendents for "heavy" scientific >calculations around 1980. In other parts of the World it lasted few
years longer, but still no longer than 1990.

Meanwhile, in my part of the third world (Austria) politicians praised themselves for buying a supercomputer from IBM. Searching for it, I
find <https://services.phaidra.univie.ac.at/api/object/o:573/get>, and
on page 2 it tells me that the inauguration of the supercomputer IBM
3090-400E VF (with two vector processors) happened on March 7, 1989.
That project was originally limited to two years, but a contract
signed on 1992-03-19 exteded the run-time and extended the hardware to
a 6-processor ES/9000 720VF; that extension also included 20
RS/6000-550, and they found out that the cumulated computing power
exceeded the one of the vector computer by far. The vector computer
was uninstalled in January 1995.

After the RS/6000 cluster they used an Alpha cluster from 1995 to
2001, and this was replaced in 2001 with a PC-based Linux cluster
(inaugurated on January 28, 2002) consisting of 160 nodes with an
Athlon XP 1700+ and 1GB RAM each.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Michael S on Fri Jan 19 16:43:30 2024

Michael S <already5chosen@yahoo.com> writes:

S/360 permanently reinvents itself. VAX could have done the same, but >voluntarily refused.

Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
imagine an alternative history where DEC has decided to avoid
switching to MIPS and Alpha, and where they would have followed up the
NVAX (which seems to be pipelined, but not superscalar, i.e., like the
486) with eventually an OoO implementation, and from then on might
have had an easier time competing with RISCs.

The question is how many customers would have defected to RISC-based
systems in the meantime, and if DEC could have survived competition
from ever more capable PCs that eliminated the RISC workstation
market and the RISC server market.

IBM z and i survives because of a legacy of system-specific software
(written in assembly or using other system-specific features), because
the additional hardware cost is an acceptable price for being able to
continue to use this software.

Many VAX customers were flexible enough to switch to something else
when VAX was no longer competetive (that's why DEC did the MIPS-based DECstations), so I doubt that DEC would have survived in the
alternative history I outlined, at least as a significant manufacturer
rather than a niche manufacturer like Unisys.

One interesting aspect is that NVAX was only released in 1991, while
the 486 was released in 1989, and the MIPS R2000 in 1986, so the VAX instruction set did have a cost.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Fri Jan 19 18:28:18 2024

According to Michael S <already5chosen@yahoo.com>:

It may have been a shotgun marriage, but it's been a very long
lasting one.

Was it?
Being younger observer from the outside, my impression is that in the
1st World people stopped using S/360 descendents for "heavy" scientific >calculations around 1980. ...

It was earlier than that. The 370/195 was IBM's last attempt to build
a supercomputer, introduced in 1970 and never sold very well. They
added vector options on later machines which someone must use, since
they're still on zSeries, but they've never been competitive for
pure computing.

The point of a mainframe is that it has a balance between CPU and I/O.
A PDP-8 had a much faster CPU than a 360/30, but the 360 had an I/O
channel that connected card readers and printers and tapes and disks
so it could do data processing work that nobody did on a PDP-8. A
PDP-8 could also conect to those but each needed an expensive I/O
interface to attach to the 8's simple I/O bus, so hardly anyone did.

Mainframes are also designed to be very reliable and maintainable. A
modern mainframe has dozens of CPUs some of which are only doing
maintenance oversight and others of which are hot spares that can
substitute for a failed processor in the middle of an instruction
stream. They're also designed so the vendor can do maintenance and
replace subystems while the system is running. People expect them to
remain up and running constantly for years at a time.

Apropos another comment that the 360 has evolved but the Vax didn't,
that is certainly true, since zSeries is about 70% new stuff and
30% 360 stuff, but the 360 was a much better place to build from.
It is much easier to build a fast 360 than a fast Vax because
the instruction set, even with all the zSeries additions, is
more regular and amenable to pipelining.

The worst mistake they made from a performance point of view is
that the architecture says an instruction can modify the next
instruction and it is supposed to work. (Back in the 1960s
on machines with 8K of RAM that was not totally silly.) But
even that hardly matters since the vast majority of code
runs out of read-only pages where you can't do that.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to Michael S on Fri Jan 19 19:58:00 2024

In article <20240119164019.0000374e@yahoo.com>, already5chosen@yahoo.com (Michael S) wrote:

Being younger observer from the outside, my impression is that in
the 1st world people stopped using S/360 descendents for "heavy"
scientific calculations around 1980.

Yup. VAXes and other superminis got you a lot more CPU per dollar.

Use of IBM manframes for CAD continued well into 90s and may be
even into this century, but CAD is not what people called
"scientific computing" back when S/360 was conceived.

Some aspects of it are, but many are not. CAD has very uneven processor
usage: vast demands for brief periods when regenerating views or models,
then very little while the designer thinks and adds to the model. Running
this on a time-shared machine is frustrating, because when a few
designers need a lot of CPU at the same time, it gets very slow.
Individual machines keep the designers happier.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to Anton Ertl on Fri Jan 19 19:58:00 2024

In article <2024Jan19.174330@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
imagine an alternative history where DEC has decided to avoid
switching to MIPS and Alpha, and where they would have followed up
the NVAX (which seems to be pipelined, but not superscalar, i.e., like
the 486) with eventually an OoO implementation, and from then on might
have had an easier time competing with RISCs.

The timeline doesn't work. DEC decided to adopt MIPS in 1989, because
they were loosing market share worryingly quickly. NVAX was released in
1991, and they'd have had real trouble developing it without the cash
from MIPS-based systems.

<https://en.wikipedia.org/wiki/DEC_Alpha#PRISM>

They opted for Alpha because they felt VAX had enough overheads that it
would always be at a disadvantage compared to RISC chips. That is less
obvious now, but that's because of the huge amounts of money that have
gone into x86 development over the last thirty years. DEC's market for
VAX systems was much smaller than the market for x86 in 1995-2010.

<https://en.wikipedia.org/wiki/DEC_Alpha#RISCy_VAX>

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to John Dallman on Fri Jan 19 20:22:56 2024

jgd@cix.co.uk (John Dallman) writes:

In article <20240119164019.0000374e@yahoo.com>, already5chosen@yahoo.com >(Michael S) wrote:

Being younger observer from the outside, my impression is that in
the 1st world people stopped using S/360 descendents for "heavy"
scientific calculations around 1980.

Yup. VAXes and other superminis got you a lot more CPU per dollar.

Use of IBM manframes for CAD continued well into 90s and may be
even into this century, but CAD is not what people called
"scientific computing" back when S/360 was conceived.

Some aspects of it are, but many are not. CAD has very uneven processor >usage: vast demands for brief periods when regenerating views or models,
then very little while the designer thinks and adds to the model. Running >this on a time-shared machine is frustrating, because when a few
designers need a lot of CPU at the same time, it gets very slow.
Individual machines keep the designers happier.

Modern chip development (RTL/Verilog) environments offload the
compute- and io-bound- jobs to a compute grid with thousands of nodes;
even the visualization jobs using X11 tunnelling to get back to
the workstation display when examining waves, for example.

When you're dealing with billions of gates on a single chip.....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to John Levine on Fri Jan 19 15:29:50 2024

John Levine wrote:

According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:

So this advantage of the VAX over S/360 was not a "fancy" addressing
mode, but the immediate addressing mode that S/360 does not have, but
that all RISCs have, even MIPS, Alpha and RISC-V (except that these
architectures define addi/addiu as separate instructions). VAX has
"short literal", as you explain (15.8% of the operands) as well as
"immediate" (2.4% of the operands). With 1.5 operands per
instruction, that alone is a factor 1.27 more instructions for S/360
than for VAX.

Looks that way. IBM apparently noticed it too since S/390 added 16 bit immediate load, compare, add, subtract, and multiply, and zSeries
added immediate everything, such as add immediate to memory.

That probably depends on the compiler. Table 2 of
<https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >> lists 4.5% "subroutine call and return", and 2.4% "procedure call and
return"; I assume the latter is the all-singing all-dancing CALL and
RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
is probably the multi-register push and pop instructions.

Sounds right. I'm surprised the procedure call numbers were so high.

I found a set of LINPACK performance results for many different cpus
from 1983 by Argonne National Laboratory, including 370/158 (they don't
say which model) and 780. The results show both the execute time and
MFLOPS so that removes the variability due to definition of "instruction".

Dongarra has many versions of this paper over the years.
This is just the one from 1983.

Performance of Various Computers Using Standard Linear Equations Software
in a Fortran Environment, Dongarra, 1983 https://dl.acm.org/doi/pdf/10.1145/859551.859555

For double precision the 158 running compiled code is about 50%
faster than 780 running "coded BLAS" (hand coded assembler)
and about 2 times faster than a 780 for compiled code.

For single precision the 780 is slightly faster for "coded BLAS"
and the 158 is about 50% faster for compiled code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Fri Jan 19 20:51:31 2024

On Thu, 18 Jan 2024 16:31:29 -0000 (UTC), John Levine wrote:

Considering that the 370 is still alive and the VAX died decades ago,
it should be evident that instruction count isn't a very useful metric
across architectures.

The 360/370/xx/3090/yy/zSeries line only survives because of business “legacy” deployments. It was never a performance-oriented architecture (witness the trouncing by CDC). It is long obsolete, and those deployments
are dwindling, if not circling the plughole.

VAX was the next step forward in the “supermini” and later “workstation”
categories, and these were definitely about price-performance. So when
other better technologies came along, they rendered it obsolete, fairly quickly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Fri Jan 19 20:58:16 2024

On Fri, 19 Jan 2024 18:28:18 -0000 (UTC), John Levine wrote:

The point of a mainframe is that it has a balance between CPU and I/O.

The point of a mainframe was that the CPU was expensive. So a lot of
effort went into complex I/O controllers that could perform chains of
multiple transfers before having to come back to the CPU to ask for more
work.

Such an architecture tends to prioritize high throughput over low latency. Which made it unsuitable for this newfangled “interactive timesharing”
that began to be popular with the new hardware and software coming from companies like DEC, DG etc.

Mainframes are also designed to be very reliable and maintainable.

They did it in a very expensive way, though. Think how Google manages reliability and maintainability today: by having a cluster of half a
million servers (maybe more by now), each built from the cheapest parts in
all ways but one--the power supply.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to EricP on Fri Jan 19 16:17:32 2024

EricP <ThatWouldBeTelling@thevillage.com> writes:

For single precision the 780 is slightly faster for "coded BLAS"
and the 158 is about 50% faster for compiled code.

trivia: jan1979, I was asked to run cdc6600 rain benchmark on
(engineering) 4341 (before shipping to customers, the engineering 4341
was clocked about 10% slower than what shipped to customers) for
national lab that was looking at getting 70 for a compute farm (sort of
the leading edge of the coming cluster supercomputing tsunami). I also
ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode and the integrated channel microcode; a 3031 was two 158 engines, one with just
the 370 microcode and a 2nd with just the integrated channel microcode.

cdc6600: 35.77secs
158: 45.64secs
3031: 37.03secs
4341: 36.21secs

... 158 integrated channel microcode was using lots of processing
cycles, even when no i/o was going on.

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Jan 20 02:38:36 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Fri, 19 Jan 2024 18:28:18 -0000 (UTC), John Levine wrote:

The point of a mainframe is that it has a balance between CPU and I/O.

The point of a mainframe was that the CPU was expensive. So a lot of
effort went into complex I/O controllers that could perform chains of >multiple transfers before having to come back to the CPU to ask for more >work.

On hign end machines, not so much small ones. On the 360/30, the same
microcode engine ran the CPU and the channel. When the channel was
working hard, the CPU pretty much stopped.

Such an architecture tends to prioritize high throughput over low latency.

Yup.

Which made it unsuitable for this newfangled “interactive timesharing” >that began to be popular with the new hardware and software coming from >companies like DEC, DG etc.

Depended on what model of interaction you wanted. If you wanted the computer to respond to each character, DEC machines were good at that since they were designed to do realtime stuff. If you wanted to do line at a time or screen
at a time interaction, mainframes did that just fine. In 1964 SABRE ran on
two IBM 7090s and provided snappy responses to 1500 terminals across the U.S.

I used CP/67 in the early 1970s and it also worked quite well, fast response
in line at a time mode.
--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Dallman on Sat Jan 20 09:10:00 2024

jgd@cix.co.uk (John Dallman) writes:

In article <2024Jan19.174330@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
imagine an alternative history where DEC has decided to avoid
switching to MIPS and Alpha, and where they would have followed up
the NVAX (which seems to be pipelined, but not superscalar, i.e., like
the 486) with eventually an OoO implementation, and from then on might
have had an easier time competing with RISCs.

The timeline doesn't work. DEC decided to adopt MIPS in 1989, because
they were loosing market share worryingly quickly. NVAX was released in
1991, and they'd have had real trouble developing it without the cash
from MIPS-based systems.

I forgot that in this alternative reality DEC would have killed the
VAX 9000 project early, leaving them lots of cash for developping
NVAX. Still, it could easily have been that they would have lost
customers to the RISC competition until they finally managed to do the
OoO-VAX.

Would they have gotten those customers back, or would they have lost
to IA-32/AMD64 anyway? Probably the latter, unless they found a
business model that allowed them to milk the customer base that was
tied to VAX while at the same time being cheap enough to compete with
Intel. They tried to go for that on the Alpha: they used firmware for
market segmentation between VMS/Digital OSF/1 on the one hand and
Linux/Windows on the other; and they also offered some relatively
cheap boards, e.g. with the 21164PC, but those were probably too
limited to be successful.

That is less
obvious now, but that's because of the huge amounts of money that have
gone into x86 development over the last thirty years. DEC's market for
VAX systems was much smaller than the market for x86 in 1995-2010.

For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
their papers on OoO (with VAX as example) starting in 1985, the
Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed in
the market, the relevant timespan was 1995-2005. Intel dropped the
64-bit IA-32 successor ball and AMD picked it up with the 2003
releases of Opteron and Athlon64.

VAX would have been extended to 64 bits some times in the early 1990s
in the alternative timeline, and DEC would have been tempted to use
the 64-bit extension for market segmentation, which again could have
resulted into DEC painting itself into a niche.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to Anton Ertl on Sat Jan 20 16:25:00 2024

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

jgd@cix.co.uk (John Dallman) writes:

The timeline doesn't work. DEC decided to adopt MIPS in 1989,
because they were loosing market share worryingly quickly.
NVAX was released in 1991, and they'd have had real trouble
developing it without the cash from MIPS-based systems.

I forgot that in this alternative reality DEC would have killed the
VAX 9000 project early, leaving them lots of cash for developping
NVAX. Still, it could easily have been that they would have lost
customers to the RISC competition until they finally managed to do
the OoO-VAX.

For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
their papers on OoO (with VAX as example) starting in 1985, the
Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
in the market, the relevant timespan was 1995-2005. Intel dropped
the 64-bit IA-32 successor ball and AMD picked it up with the 2003
releases of Opteron and Athlon64.

This requires DEC to take notice of those papers and start developing OoO
quite quickly. They did not do that historically, and they seem to have
been confident that their way of working would carry on being effective,
until RISC demonstrated otherwise. This is the timeframe where IBM gave
up on building mainframes with competitive compute power, and settled for
them being capable data-movers.

If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was
275K transistors the same year, the 40486 was 1.2M transistors in 1989,
so the transistor budget could be there.

Would they have gotten those customers back, or would they have lost
to IA-32/AMD64 anyway? Probably the latter, unless they found a
business model that allowed them to milk the customer base that was
tied to VAX while at the same time being cheap enough to compete
with Intel.

I had experience from two different market segments of dealing with DEC.
In the early 1990s, I was working for a company based around MS-DOS
software. That was running pretty fast on 486 and Pentium machines. We
had contact with DEC because one of our large customers had DEC as their primary IT supplier, and one of our managers had bought a DEC PC, from a company who realised he was ignorant and unloaded obsolete hardware on
him at high prices.

If you weren't a major DEC customer, they were hell to deal with. They
just didn't do things, even after agreeing to do so. They charged
ludicrous prices for minor things. We needed a replacement key for the anti-tamper lock on DEC PC, because the chap had lost it. They were free,
but the delivery charge was about $60, by cab. Getting them to just post
it took a lengthy argument.

Getting a replacement Pentium for one that had the FDIV bug required
compiling a log of weeks of broken promises from the parts centre and
faxing it to DEC's personnel department, asking for it to be placed on
the relevant manager's file and considered at his next performance review.
We couldn't just get one from Intel: the necessary heat sink was
permanently bonded to the old chip, so we needed a new one with DEC's
specific heatsink.

At the customer who had DEC as an IT supplier, DEC staff didn't know
anything about PCs or MS-DOS. They only knew VMS, which seemed weird and
arcane to us, but the DEC staff were sure it was infinitely superior, and
could not explain why. They really did not make DEC seem attractive as a supplier.

Then I changed jobs in 1995 to a company that supplied software for VAX
VMS, Alpha VMS, OSF/1 on Alpha and Windows on Alpha. Dealing with DEC
from there was much better. They were capable, helpful and efficient. But
they still didn't understand PCs, and Windows NT was effective at running complex software and was far cheaper and more attractive to PC users than
VMS.

The OoO VAX alternate history changes a lot of things. It means PRISM
doesn't start, and the multiple-personality OS concept that became MICA
may or may not happen. The lack of a PRISM+MICA cancellation means Dave
Cutler probably doesn't move to Microsoft, and then Windows NT doesn't
happen, at least not in the same way.

The Mac still causes a shift to GUIs. If DEC can come up with, or buy in,
a good one then they may do very well, and Microsoft may not become
nearly so important. That would reduce the importance of Intel, which
might mean IA-64 never happens.

They tried to go for that on the Alpha: they used firmware for
market segmentation between VMS/Digital OSF/1 on the one hand and Linux/Windows on the other; and they also offered some relatively
cheap boards, e.g. with the 21164PC, but those were probably too
limited to be successful.

Producing software for Alpha Windows was reasonably straightforward, if
you had well-behaved software written in a HLL that there were compilers
for. This meant that people who were coming down from the Unix world
didn't have much trouble. Going upwards from the MS-DOS/Windows world was harder: you couldn't hit the hardware, you had to rewrite any assembler
code, and FX!32 wasn't quite as good as it was cracked up to be. Alpha
Windows software was worth producing until about 1998, when its
performance advantage evaporated.

VAX would have been extended to 64 bits some times in the early
1990s in the alternative timeline, and DEC would have been tempted
to use the 64-bit extension for market segmentation, which again
could have resulted into DEC painting itself into a niche.

Yup. Really, you have to get the traditional DEC management to all retire before 1990, and the new management need to be brave /and/ lucky.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Dallman on Sat Jan 20 18:15:43 2024

jgd@cix.co.uk (John Dallman) writes:

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
their papers on OoO (with VAX as example) starting in 1985, the
Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
in the market, the relevant timespan was 1995-2005. Intel dropped
the 64-bit IA-32 successor ball and AMD picked it up with the 2003
releases of Opteron and Athlon64.

This requires DEC to take notice of those papers and start developing OoO >quite quickly.

...

If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get >somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was >275K transistors the same year, the 40486 was 1.2M transistors in 1989,
so the transistor budget could be there.

Not in a single chip. The CPU die of the Pentium Pro has 5.5M
transistors and was available in 1995. Nobody else was much earlier
on OoO, even with the RISC advantage. If DEC had picked up the HPS
ideas and invented what's missing from there, they might have had the
OoO VAX as a multi-chip thing in the early 1990s, and maybe gotten it
on a single chip by 1995. But its performance in the early 1990s
would have been great, so it could have won back customers.

Yup. Really, you have to get the traditional DEC management to all retire >before 1990, and the new management need to be brave /and/ lucky.

Yes, you would basically need to have a whole bunch of managers and
tech team leaders take a time machine from, say, today, so they know
where to go, and they still would need to make and enforce good
decisions to make the company succeed in the long term rather than
painting itself into a corner by maximizing short-term revenue.

You story about your experiences with DEC remind me of one statement I
once read: DEC buy X, and the result is DEC. Compaq buys DEC, and the
result is DEC (as in, the DEC attitude won over the Compaq attitude).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Sat Jan 20 19:50:30 2024

According to John Dallman <jgd@cix.co.uk>:

Yup. Really, you have to get the traditional DEC management to all retire >before 1990, and the new management need to be brave /and/ lucky.

DEC never really understood what business they were in. They had a pretty good run selling hardware that was cheap and reliable, with software that was adequate.
But more often than not it was used with other software, Compuserve's system and Tenex on the -10, and Unix on the -11 and Vax.

That worked fine while minicomputers were the cheapest way to do small scale computing. Once micros came in, they weren't able to produce chips that competed on their own (as opposed to being slightly cheaper versions of
their minis) and they deluded themselves that they could lock people in
with VMS the way IBM did with DOS and OS and AS/400.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Anton Ertl on Sat Jan 20 22:19:51 2024

On Sat, 20 Jan 2024 18:15:43 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

You story about your experiences with DEC remind me of one statement I
once read: DEC buy X, and the result is DEC. Compaq buys DEC, and the
result is DEC (as in, the DEC attitude won over the Compaq attitude).

- anton

But later on HP bought Compaq and eventually the computing side of the
business became indistinguishable from Compaq. Both DEC and HP parts
already dissolved. Ex-SGI side still hanging, but likely not for long.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Dallman on Sat Jan 20 21:33:11 2024

On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman wrote:

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

... Dave Cutler probably doesn't move to Microsoft, and then Windows NT doesn't happen, at least not in the same way.

Imagine if it hadn’t been created by a Unix-hater. But then, Microsoft had already divested themselves of Xenix by then, hadn’t they? So they
probably didn’t have anyone left who understood the value of Unix.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Michael S on Sat Jan 20 21:50:59 2024

Michael S <already5chosen@yahoo.com> schrieb:

Being younger observer from the outside, my impression is that in the
1st World people stopped using S/360 descendents for "heavy" scientific calculations around 1980.

I certainly used a /360 descendants (Siemens 7881, then IBM
3090) for scientific work, but the latter was also often used
as the front end for the (also S/360 compatible) Fujitsu VP.
Hmm... looking around a bit, the IBM 3090 I worked on had 150
MFlops with its vector facility. That was not too bad when it
was purchased in 1989, but the worksations purchased soon after
eclipsed it in computing power for the individual user, and the
vector computers (Fujitsu VP in Karlsruhe) also did so. The IBM
3090 was used mainly as a front end to the VP.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Sat Jan 20 21:36:37 2024

On Sat, 20 Jan 2024 19:50:30 -0000 (UTC), John Levine wrote:

DEC never really understood what business they were in.

They were a company running by engineers, selling to engineers and others
who understood technical stuff. That was a great business model from the introduction of the PDP-1 in 1959 up to the coming of RISC and the IBM PC, mid-1980s. That was a pretty good run, until you have to start to think
about remaking yourself. Which they had trouble doing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lynn Wheeler on Sun Jan 21 15:06:43 2024

On Fri, 19 Jan 2024 16:17:32 -1000
Lynn Wheeler <lynn@garlic.com> wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

For single precision the 780 is slightly faster for "coded BLAS"
and the 158 is about 50% faster for compiled code.

trivia: jan1979, I was asked to run cdc6600 rain benchmark on
(engineering) 4341 (before shipping to customers, the engineering 4341
was clocked about 10% slower than what shipped to customers) for
national lab that was looking at getting 70 for a compute farm (sort
of the leading edge of the coming cluster supercomputing tsunami). I
also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
and the integrated channel microcode; a 3031 was two 158 engines, one
with just the 370 microcode and a 2nd with just the integrated
channel microcode.

cdc6600: 35.77secs
158: 45.64secs
3031: 37.03secs
4341: 36.21secs

... 158 integrated channel microcode was using lots of processing
cycles, even when no i/o was going on.

Did I read it right? Brand new mid-range IBM mainframe barely matched
15 y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Sun Jan 21 15:19:22 2024

On Sat, 20 Jan 2024 21:33:11 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman
wrote:

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

... Dave Cutler probably doesn't move to Microsoft, and then
Windows NT doesn't happen, at least not in the same way.

Imagine if it hadn’t been created by a Unix-hater. But then,
Microsoft had already divested themselves of Xenix by then, hadn’t
they? So they probably didn’t have anyone left who understood the
value of Unix.

I see nothing wrong in DC being Unix hater.
Much much worse that he didn't understand that it is not 1970s any more
and that in 1990s plug&play support is necessity, including "hot"
plug&play.
Because of that blind spot, Win9x line, created by people that did
understand the value of plug&play (Brad Silverberg ? I can't find much
info about lead 9x architects on the Net), but very problematic
otherwise, lasted for much longer than it should have been.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to John Dallman on Sun Jan 21 08:43:30 2024

John Dallman wrote:

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

jgd@cix.co.uk (John Dallman) writes:

The timeline doesn't work. DEC decided to adopt MIPS in 1989,
because they were loosing market share worryingly quickly.
NVAX was released in 1991, and they'd have had real trouble
developing it without the cash from MIPS-based systems.

I forgot that in this alternative reality DEC would have killed the
VAX 9000 project early, leaving them lots of cash for developping
NVAX. Still, it could easily have been that they would have lost
customers to the RISC competition until they finally managed to do
the OoO-VAX.

For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
their papers on OoO (with VAX as example) starting in 1985, the
Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
in the market, the relevant timespan was 1995-2005. Intel dropped
the 64-bit IA-32 successor ball and AMD picked it up with the 2003
releases of Opteron and Athlon64.

This requires DEC to take notice of those papers and start developing OoO quite quickly. They did not do that historically, and they seem to have
been confident that their way of working would carry on being effective, until RISC demonstrated otherwise. This is the timeframe where IBM gave
up on building mainframes with competitive compute power, and settled for them being capable data-movers.

If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was 275K transistors the same year, the 40486 was 1.2M transistors in 1989,
so the transistor budget could be there.

There was also the CVAX in 1986, 134,000 transistors (out of 180,000 sites), 2um CMOS, 3 layers interconnect, 90 ns clock, internal 1 kB 2-way ass. cache. Separate FPU coprocessor chip 65,000 transistors.

But these were only available in systems like 6240, quad SMP processors,
256 kB L2 cache, up to 256 MB main memory, and up to 6 high speed IO buses,
in multiple cabinets.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Michael S on Sun Jan 21 16:30:36 2024

Michael S <already5chosen@yahoo.com> writes:

On Fri, 19 Jan 2024 16:17:32 -1000
Lynn Wheeler <lynn@garlic.com> wrote:

cdc6600: 35.77secs
158: 45.64secs
3031: 37.03secs
4341: 36.21secs

...

Did I read it right? Brand new mid-range IBM mainframe barely matched
15 y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

That depends on the price, and there are also properties like size,
power consumption and cooling requirements. IBM mainframes were not
designed for HPC (with a few exceptions); if you wanted that, you
would have bought a Cray-1 in 1979 when the 4341 appeared.

There is also the thing about IBM's market: Amdahl said about the (high-performance) ACS-360 <https://people.computing.clemson.edu/~mark/acs_end.html>:

|Yes, but the company decided not to build it because it would have
|destroyed the pricing structures. In the first place, it would have
|forced them to make higher-end machines. But with IBM's pricing
|structure, the market disappeared by the time performance got to a
|certain level. Any machine above that in performance or price could
|only lose money.

The ACS-360 was cancelled for that reason.

Also, remember that these were not the 1990s with their extreme
advances every year; instead, the performance advances were quite a
bit slower, just like we have seen in the last two decades. And if
you compare a 2023-vintage Rock 5B (with Cortex-A76 like the Raspi5)
with a 2008-vintage Core 2 Duo E8400 PC, the Rock 5B is slightly
slower when running LaTeX, but its also much cheaper, smaller,
consumes much less power and actually works without a cooler (but we
provided one nonetheless; the Raspi 5 SoC is made in a less advanced
process and needs more cooling).

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jan 21 21:27:23 2024

On Sun, 21 Jan 2024 15:06:43 +0200, Michael S wrote:

Did I read it right? Brand new mid-range IBM mainframe barely matched 15
y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

A minute’s silence for the hardware legend that was Seymour Cray.

And a minute’s jeering at IBM’s FUD campaign to try to put CDC out of business.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jan 21 21:28:13 2024

On Sun, 21 Jan 2024 15:19:22 +0200, Michael S wrote:

On Sat, 20 Jan 2024 21:33:11 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman
wrote:

In article <2024Jan20.101000@mips.complang.tuwien.ac.at>,
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

... Dave Cutler probably doesn't move to Microsoft, and then Windows
NT doesn't happen, at least not in the same way.

Imagine if it hadn’t been created by a Unix-hater. But then, Microsoft
had already divested themselves of Xenix by then, hadn’t they? So they
probably didn’t have anyone left who understood the value of Unix.

I see nothing wrong in DC being Unix hater.

WSL might not have been necessary. Microsoft would not now be struggling
to offer some semblance of Linux compatibility.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Michael S on Sun Jan 21 21:26:37 2024

Michael S wrote:

On Fri, 19 Jan 2024 16:17:32 -1000
Lynn Wheeler <lynn@garlic.com> wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

For single precision the 780 is slightly faster for "coded BLAS"
and the 158 is about 50% faster for compiled code.

trivia: jan1979, I was asked to run cdc6600 rain benchmark on
(engineering) 4341 (before shipping to customers, the engineering 4341
was clocked about 10% slower than what shipped to customers) for
national lab that was looking at getting 70 for a compute farm (sort
of the leading edge of the coming cluster supercomputing tsunami). I
also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
and the integrated channel microcode; a 3031 was two 158 engines, one
with just the 370 microcode and a 2nd with just the integrated
channel microcode.

cdc6600: 35.77secs
158: 45.64secs
3031: 37.03secs
4341: 36.21secs

... 158 integrated channel microcode was using lots of processing
cycles, even when no i/o was going on.

Did I read it right? Brand new mid-range IBM mainframe barely matched
15 y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

Target market for 4341 was not scientific computing, either.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Sun Jan 21 21:51:56 2024

MitchAlsup1 <mitchalsup@aol.com> schrieb:

Michael S wrote:

On Fri, 19 Jan 2024 16:17:32 -1000
Lynn Wheeler <lynn@garlic.com> wrote:

EricP <ThatWouldBeTelling@thevillage.com> writes:

For single precision the 780 is slightly faster for "coded BLAS"
and the 158 is about 50% faster for compiled code.

trivia: jan1979, I was asked to run cdc6600 rain benchmark on
(engineering) 4341 (before shipping to customers, the engineering 4341
was clocked about 10% slower than what shipped to customers) for
national lab that was looking at getting 70 for a compute farm (sort
of the leading edge of the coming cluster supercomputing tsunami). I
also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
and the integrated channel microcode; a 3031 was two 158 engines, one
with just the 370 microcode and a 2nd with just the integrated
channel microcode.

cdc6600: 35.77secs
158: 45.64secs
3031: 37.03secs
4341: 36.21secs

... 158 integrated channel microcode was using lots of processing
cycles, even when no i/o was going on.

Did I read it right? Brand new mid-range IBM mainframe barely matched
15 y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

Target market for 4341 was not scientific computing, either.

And yet, people used IBM mainframes for scientific computing...

For example, the IBM 4361 had, as an optional feature, the maximum
precision scalar product developed by the University of Karlsruhe.

Not sure why they went to IBM with it, maybe DEC would have been
a better choice. Then again, the people at the computer center
in Karlsruhe were very mainframe-oriented...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sun Jan 21 22:01:34 2024

Lawrence D'Oliveiro <ldo@nz.invalid> writes:

On Sun, 21 Jan 2024 15:06:43 +0200, Michael S wrote:

Did I read it right? Brand new mid-range IBM mainframe barely matched 15
y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

A minute’s silence for the hardware legend that was Seymour Cray.

He was a friend of my Godfather (who lived in Chippewa Falls), right around
the time I first had access to a computer (1974, B5500). I didn't
realize who he was until much later, however and never had a chance to
discuss computers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Sun Jan 21 23:54:01 2024

On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

Not sure why they went to IBM with it, maybe DEC would have been a
better choice. Then again, the people at the computer center in
Karlsruhe were very mainframe-oriented...

There seemed to be a lot of people like that, who only knew IBM and saw
the whole world through IBM lenses. To the rest of us, IBM’s way of doing things just seemed overcomplicated, unwieldy, inflexible ... and
expensive.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lynn Wheeler@21:1/5 to Michael S on Sun Jan 21 16:37:40 2024

Michael S <already5chosen@yahoo.com> writes:

Did I read it right? Brand new mid-range IBM mainframe barely matched
15 y.o. CDC machine that was 10 years out of production ?
That sounds quite embarrassing.

national lab was looking at getting 70 because of price/performance
... sort of the leading edge of the coming cluster scale-up
supercomputing tsunami.

decade later had project originally HA/6000 for NYTimes to move their
newspaper system (ATEX) off (DEC) VaxCluster to RS/6000. I rename it
HA/CMP when I start doing technical/scientific cluster scale-up with
national labs and commercial cluster scale-up with RDBMS vendors
(Oracle, Sybase, Informix, Ingres). Early Jan1992, meeting with Oracle
CEO, who is told 16-way cluster mid-92 and 128-way cluster
ye-92. However, end of Jan1992, cluster scaleup is transferred for
announce as IBM supercomputer (for technical/scientific *ONLY*, possibly because of commercial cluster scaleup "threat") and we are told we
couldn't work on anything with more than four processors (we leave IBM a
few months later). A couple weeks later, IBM (cluster) supercomputer
group in the press (pg8) https://archive.org/details/sim_computerworld_1992-02-17_26_7

First half 80s, IBM 4300s sold into the same mid-range market as VAX and
in about the same numbers for single and small unit orders ... big
difference was large companies ordering hundreds of 4300s at a time for
placing out in departmental areas (sort of the leading edge of the
coming distributed comuting tsunami).

old archived post with vax sales, sliced and diced by model, year,
us/non-us
http://www.garlic.com/~lynn/2002f.html#0

2nd half of 80s, mid-range market was moving to workstation and large PC servers ... affecting both VAX and 4300s

--
virtualization experience starting Jan1968, online at home since Mar1970

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jan 22 02:46:05 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

Not sure why they went to IBM with it, maybe DEC would have been a
better choice. Then again, the people at the computer center in
Karlsruhe were very mainframe-oriented...

There seemed to be a lot of people like that, who only knew IBM and saw
the whole world through IBM lenses. ...

IBM has a big development lab in Boeblingen which is about an hour from Karlsruhe.

At that time DEC had no labs outside the United States.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon Jan 22 03:21:21 2024

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Mon Jan 22 09:59:24 2024

Lawrence D'Oliveiro wrote:

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from
Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

My uni lecturer had a favorite:

IBM's patent for a zero-time sorting chip.

It was basically a DMA-style memory device that was setup as a big
ladder of comparators so that it could do a parallel bubble sort:

As each new item arrived it would be compared with the current top, and
the loser would be pushed down to the next ladder level, replacing the
time which had at the same time lost the comparison at that level.

By the time all items had been loaded, the top would be the overall
winner, right?

You would then reverse the direction, while keeping the comparators
active, so now you would stream out perfectly sorted items.

The real problem is of course that this is effectively very expensive
memory, and as soon as you ran out of space in the chip you would have
to fall back on multi-way merge between separate runs of chip-size chunks.

In pretty much every conceivable real-world situation you would much
rather have 10x more real memory and apply indexing to any data you
might want to retrieve quickly in some sorted order and/or sort it on
demand.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to John Levine on Mon Jan 22 11:58:28 2024

On 05/01/2024 18:05, John Levine wrote:

According to EricP <ThatWouldBeTelling@thevillage.com>:

<snip>

DG Nova had infinite indirection - if the Indirect bits was set in the
instruction then in the address register if the msb of the address was zero >> then it was the address of the 16-bit data, if the msb of the address was 1 >> then it was the address of another address, looping until msb = 0.
I don't know how DG used it but, just guessing, because Nova only had
4 registers might be to create a kind of virtual register set in memory.

My guess is that it was cheap to implement and let them say look, here
is a cool thing that we do and DEC doesn't. I would be surprised if
there were many long indirect chains.

As has been mentioned elsewhere recently DEC did exactly this on the PDP-10.

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jan 22 16:42:50 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from
Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their >researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jan 22 16:46:45 2024

According to Vir Campestris <vir.campestris@invalid.invalid>:

My guess is that it was cheap to implement and let them say look, here
is a cool thing that we do and DEC doesn't. I would be surprised if
there were many long indirect chains.

As has been mentioned elsewhere recently DEC did exactly this on the PDP-10.

It was more complicated than that on the PDP-6/10. At each stage it not
only did indirection, it could also add in an index register. I can sort
of imagine how one might use all that for dynamically allocated array
rows but I never saw more than two levels in practice and never saw
indexing in indirect words.

In their defense, the addressing was very consistent, start with the instruction word and keep indexing and indirecting until you come up
with the address.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to John Levine on Mon Jan 22 17:42:19 2024

John Levine <johnl@taugh.com> schrieb:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from
Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

Or you want to be able to use them at a later date, so nobody else
can patent that particular invention.

This has led to some patents being filed in Luxemburg only, for example.

Another method, which is getting harder in the age of search
engines, is the "secret" publication by publishing it somewhere
where it is unlikely to be found, such as the (non-existent)
"Acta Physical Mongolica".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to John Levine on Mon Jan 22 17:48:38 2024

John Levine <johnl@taugh.com> writes:
[unbounded indirection:]

It was more complicated than that on the PDP-6/10. At each stage it not
only did indirection, it could also add in an index register. I can sort
of imagine how one might use all that for dynamically allocated array
rows but I never saw more than two levels in practice and never saw
indexing in indirect words.

The implementation of a logic variable is a parent-pointer tree where
you follow the parent pointer pointers until you are at the root
(which is a free variable or instantiated to a value). The automatic
unbounded indirection of the PDP-6/10 and Nova appears to be ideal for
that. And actually the most influential Prolog for quite a number of
years was DEC-10 Prolog; I don't know if it used that feature, but I
would be surprised if it did not. Still, Prolog could be implemented
on architectures without that feature.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Mon Jan 22 17:37:23 2024

Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:

On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

Not sure why they went to IBM with it, maybe DEC would have been a
better choice. Then again, the people at the computer center in
Karlsruhe were very mainframe-oriented...

There seemed to be a lot of people like that, who only knew IBM and saw
the whole world through IBM lenses. To the rest of us, IBM’s way of doing things just seemed overcomplicated, unwieldy, inflexible ... and
expensive.

That wasn't the case here.

The mainframe they had at the computer center before was a UNIVAC
(don't know which model, it was decommissioned before I started
on the Siemens/Fujitsu mainframe there), and they had a Cyber 205.

So, maybe more mainframe-oriented, but not necessarily IBM.
But then again, the 4361 was not really a mainframe.

But proximity to of Karlsruhe to Böblingen (which John
L. mentioned) might well have been a factor. It is entirely
plausible that contacts existed, for example from students who
started to work there.

And, googling around for a bit, I find that the 4361 was indeed
developed at Böblingen. This probably settles it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Levine on Mon Jan 22 19:03:33 2024

John Levine wrote:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from
Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

See, you cannot sue me for patent infringement, I am only doing what MY
patent on that mater allows.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jan 22 19:22:16 2024

According to MitchAlsup1 <mitchalsup@aol.com>:

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

See, you cannot sue me for patent infringement, I am only doing what MY >patent on that mater allows.

It's more than that. If someone threatens IBM with a patent suit, IBM's
usual response is that they have 100,000 patents in their portfolio, so
they're pretty sure that if they look, they will find something that
the other party is doing that looks like one of those patents and
will countersue. Patent suits are very expensive and IBM has
deep pockets.

Big companies often avoid this by cross-licensing, I won't sue you for
anything in our pile of patents if you won't sue me for anything in
your pile.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Mon Jan 22 20:32:10 2024

MitchAlsup1 <mitchalsup@aol.com> schrieb:

John Levine wrote:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

IBM has a big development lab in Boeblingen which is about an hour from >>>> Karlsruhe.

At one time, IBM were the world’s biggest holder of patents. Their >>>researchers came up with many clever ideas. But my impression was, very >>>few of those ideas actually made it into their products.

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

See, you cannot sue me for patent infringement, I am only doing what MY patent on that mater allows.

A patent gives its owner the right to keep others from using the
invention as described in the claims. It does _not_ give the owner
any rights to use the invention that he would not have otherwise.

It is perfectly possible, if undesirable for the patent holder,
to be dependent on some other patent. It is also possible, if
rarer, for two patents to block each other, so nobody can use
the invention. This can then be a reason to negotiate, or
(in extreme cases) to ligitate.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to John Levine on Mon Jan 22 20:32:30 2024

John Levine wrote:

According to MitchAlsup1 <mitchalsup@aol.com>:

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

See, you cannot sue me for patent infringement, I am only doing what MY >>patent on that mater allows.

It's more than that. If someone threatens IBM with a patent suit, IBM's usual response is that they have 100,000 patents in their portfolio, so they're pretty sure that if they look, they will find something that
the other party is doing that looks like one of those patents and
will countersue. Patent suits are very expensive and IBM has
deep pockets.

Big companies often avoid this by cross-licensing, I won't sue you for anything in our pile of patents if you won't sue me for anything in
your pile.

Yes, I used the small (startup) patent model.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon Jan 22 20:37:48 2024

On Mon, 22 Jan 2024 16:42:50 -0000 (UTC), John Levine wrote:

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
few of those ideas actually made it into their products.

A lot of patents are defensive, you don't necessarily plan to use them
but you don't want anyone else to own them.

Here’s one notorious one: they had a patent on the use of bit-flipping to produce a flashing cursor on a text terminal. And they sued other terminal vendors for copying this idea.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Mon Jan 22 20:41:18 2024

On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

A patent gives its owner the right to keep others from using the
invention as described in the claims. It does _not_ give the owner any rights to use the invention that he would not have otherwise.

And more than that, you don’t actually need to prove your idea works
before you can get a patent on it. I think the legal term is “reduce to practice”, which basically means “write up a plausible-sounding
description of how it *might* work”.

This is why there was nothing to stop people patenting an endless variety
of ideas for perpetual-motion machines; it needed explicit rules brought
in specifically to prohibit them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Mon Jan 22 23:12:26 2024

According to Lawrence D'Oliveiro <ldo@nz.invalid>:

And more than that, you don’t actually need to prove your idea works
before you can get a patent on it. I think the legal term is “reduce to >practice”, which basically means “write up a plausible-sounding >description of how it *might* work”.

This is why there was nothing to stop people patenting an endless variety
of ideas for perpetual-motion machines; it needed explicit rules brought
in specifically to prohibit them.

In the US at least, you are supposed to have reduced your invention to practice although it is obvious that many patentees haven't.

The patent office is allowed to ask for a working model of any
invention. Back in the 1800s they got models for everything (a
fabulous collection that was sadly destroyed first by fires and later
by auction.) Now they don't except for perpetual motion machines.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Tue Jan 23 06:50:05 2024

Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:

On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

A patent gives its owner the right to keep others from using the
invention as described in the claims. It does _not_ give the owner any
rights to use the invention that he would not have otherwise.

And more than that, you don’t actually need to prove your idea works
before you can get a patent on it. I think the legal term is “reduce to practice”, which basically means “write up a plausible-sounding description of how it *might* work”.

§ 21 of the German Patent Law states (deepl-assisted translation,
IANAL)

(1) The patent shall be revoked (§ 61) if it is found that

[...]

2. the patent does not disclose the invention so clearly and completely
that a person skilled in the art can carry it out,

To avoid insufficient disclosure, people (including myself, I have to
admit) now put a _lot_ of details into patents, which makes the patents
much longer than previously, and more painful to write and to read.

This is why there was nothing to stop people patenting an endless variety
of ideas for perpetual-motion machines; it needed explicit rules brought
in specifically to prohibit them.

A patent has to be industrially applicable (§1), and (§5)

An invention is considered to be industrially applicable if
its subject matter can be made or used in any industrial field,
including agriculture.

Something that does not work can clearly not be used in an
industrial field.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Tue Jan 23 21:13:29 2024

Thomas Koenig wrote:

Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:

On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

A patent gives its owner the right to keep others from using the
invention as described in the claims. It does _not_ give the owner any
rights to use the invention that he would not have otherwise.

And more than that, you don’t actually need to prove your idea works
before you can get a patent on it. I think the legal term is “reduce to
practice”, which basically means “write up a plausible-sounding
description of how it *might* work”.

§ 21 of the German Patent Law states (deepl-assisted translation,
IANAL)

(1) The patent shall be revoked (§ 61) if it is found that

[...]

2. the patent does not disclose the invention so clearly and completely
that a person skilled in the art can carry it out,

To avoid insufficient disclosure, people (including myself, I have to
admit) now put a _lot_ of details into patents, which makes the patents
much longer than previously, and more painful to write and to read.

This is why there was nothing to stop people patenting an endless variety
of ideas for perpetual-motion machines; it needed explicit rules brought
in specifically to prohibit them.

A patent has to be industrially applicable (§1), and (§5)

An invention is considered to be industrially applicable if
its subject matter can be made or used in any industrial field,
including agriculture.

Something that does not work can clearly not be used in an
industrial field.

When multiple patents arrive at the patent office contemporaneously,
and they all describe essentially the same mechanism or algorithm::
they should ALL be denied as something "obvious to one skilled in the
art".

Yet, the opposite happens.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to All on Tue Jan 23 22:03:24 2024

On Tue, 23 Jan 2024 21:13:29 +0000, MitchAlsup1 wrote:

When multiple patents arrive at the patent office contemporaneously, and
they all describe essentially the same mechanism or algorithm:: they
should ALL be denied as something "obvious to one skilled in the art".

Yet, the opposite happens.

Worse than that, if evidence comes to light of “prior art”, that is, use/ disclosure of the patented techniques prior to the patent registration,
that should invalidate the patent. Yet, in the US at least, this turns out
to be very hard.

Case in point: the NewEgg patent, which was just an application of Diffie- Helmann key exchange. Whitfield Diffie himself took the stand to testify
that he had come up with the idea decades before. Yet the jury were unconvinced, and let the patent stand.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Wed Jan 24 06:52:25 2024

Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:

On Tue, 23 Jan 2024 21:13:29 +0000, MitchAlsup1 wrote:

When multiple patents arrive at the patent office contemporaneously, and
they all describe essentially the same mechanism or algorithm:: they
should ALL be denied as something "obvious to one skilled in the art".

Yet, the opposite happens.

Worse than that, if evidence comes to light of “prior art”, that is, use/ disclosure of the patented techniques prior to the patent registration,
that should invalidate the patent. Yet, in the US at least, this turns out
to be very hard.

It is then a matter for the opposition division to decide, then the
board of appeal, then the patent courts (at least that is the EPO
procedure).

Case in point: the NewEgg patent, which was just an application of Diffie- Helmann key exchange. Whitfield Diffie himself took the stand to testify
that he had come up with the idea decades before. Yet the jury were unconvinced, and let the patent stand.

Was this before or after the US followed the rest of the world by
allowing opposition proceedings? Having such a case go straight
to a jury is somewhat problematic...

But "came up with the idea" is not prior art if it isn't disclosed.
If he didn't have a publication, or slides from a presentation,
that does not count.

Had he said "It was an obvious application that anybody working
in the field would have thought of with half a brain", that would
have been a strong argument for lack of inventive step.

But lack of inventive step is tricky...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Levine@21:1/5 to All on Wed Jan 24 14:54:54 2024

According to Thomas Koenig <tkoenig@netcologne.de>:

Case in point: the NewEgg patent, which was just an application of Diffie- >> Helmann key exchange. Whitfield Diffie himself took the stand to testify
that he had come up with the idea decades before. Yet the jury were
unconvinced, and let the patent stand.

Was this before or after the US followed the rest of the world by
allowing opposition proceedings? Having such a case go straight
to a jury is somewhat problematic...

It was in 2013. Since 2012 the US has had inter partes review, where
you can have the Patent Trial and Appeal Board review a patent to see
if it's not novel. That case was filed under the old rules, and was in
Marshall TX, a rural corner of Texas with a judge notoriously friendly
to patent trolls.

--
Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to johnl@taugh.com on Wed Jan 24 11:33:47 2024

On Wed, 24 Jan 2024 14:54:54 -0000 (UTC), John Levine
<johnl@taugh.com> wrote:

According to Thomas Koenig <tkoenig@netcologne.de>:

Case in point: the NewEgg patent, which was just an application of Diffie- >>> Helmann key exchange. Whitfield Diffie himself took the stand to testify >>> that he had come up with the idea decades before. Yet the jury were
unconvinced, and let the patent stand.

Was this before or after the US followed the rest of the world by
allowing opposition proceedings? Having such a case go straight
to a jury is somewhat problematic...

It was in 2013. Since 2012 the US has had inter partes review, where
you can have the Patent Trial and Appeal Board review a patent to see
if it's not novel. That case was filed under the old rules, and was in >Marshall TX, a rural corner of Texas with a judge notoriously friendly
to patent trolls.

If it went to trial in 2013, the case was brought LONG before that,
and would have been governed by the rules in force when it started.
Patents are litigated in federal courts where the wait for a trial
typically is 3..4 years.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	307
Nodes:	16 (2 / 14)
Uptime:	69:46:30
Calls:	6,915
Files:	12,380
Messages:	5,431,960

indirection in old architectures

Who's Online

System Info