• indirection in old architectures

    From Anton Ertl@21:1/5 to All on Fri Dec 29 17:20:43 2023
    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

    The major question I have is why these architectures have this

    The only use I can come up with for the arbitrarily repeated
    indirection is the implementation of logic variables in Prolog.
    However, Prolog was first implemented in 1970, and it did not become a
    big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

    A use for a single indirection is the implementation of the memory
    management in the original MacOS: Each dynamically allocated memory
    block was referenced only from a single place (its handle), so that
    the block could be easily relocated. Only the address of the handle
    was freely passed around, and accessing the block then always required
    double indirection. MacOS was implemented on the 68000, which did not
    have the indirect bit; this demonstrates that the indirect bit is not
    necessary for that. Nevertheless, such a usage pattern might be seen
    as a reason to add the indirect bit. But is it enough?

    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Anton Ertl on Fri Dec 29 19:04:56 2023
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this >repeats until a word without this bit is found. At least that's how I >understand the descriptions of this feature.

    That's essentially accurate. The Burroughs medium systems
    operands were described by an operand address that included
    an 'address controller'. The address controller, a four-bit
    field, specified two characteristics of the address; the
    two-bit 'index' field contained the number of the index register
    (there were three) to be used when calculating the final
    address. The other two bits described how the data at the
    final address should be treated by the processor
    0b00 Unsigned Numeric Data [UN] (BCD)
    0b01 Signed Numeric Data [SN] (BCD, first digit 0b1100 = "+", 0b1101 = '-').
    0b10 Unsigned Alphanumeric Data [UA] (EBCDIC)
    0b11 Indirect Address [IA]

    Consider the operand 053251, this described an unsigned
    numeric value starting at the address 53251 with no indexing.

    The operand 753251 described an address indexed by IX1
    and of the type 'indirect address' which points to another
    operand word (potentially resulting in infinite recursion,
    which was detected by an internal timer which would terminate
    the process when triggered).

    The actual operand data type was determined by the
    address controller of the first operand that isn't
    marked IA.

    The major question I have is why these architectures have this

    Primarily for flexibility in addressing without adding substantial
    hardware support.

    The only use I can come up with for the arbitrarily repeated
    indirection is the implementation of logic variables in Prolog.

    The aforementioned system ran mostly COBOL code (with some BPL;
    assemblers weren't generally provided to customers).

    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?

    Consider following a linked list to the final element as an
    example usage.

    The aforementioned system also had a SLL (Search Linked List)
    that would test each element for one of several conditions
    and terminate the indirection when the condition was true.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    I guess I don't understand the question. It's just a pointer in
    a linked list.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Dec 29 19:36:00 2023
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this >repeats until a word without this bit is found. At least that's how I >understand the descriptions of this feature.

    More or less. Indirect addressing was always controlled by a bit in
    the instruction. It was more common to have only a single level of
    indirect addressing, just controlled by that instruction bit.
    Multi-level wasn't much more useful and you had to have a way to break
    address loops.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    The CPU follows the indirect address chain to get the operand address
    and then does the operation. On the PDP-10, this stores into the
    word that FOO points to, perhaps after multiple indirections:


    while this stores into FOO itself:


    The major question I have is why these architectures have this

    Let's say you want to add up a list of numbers and your machine
    doesn't have any index registers. What else are you going to do?

    Indirect addressing was a big improvement over patching the
    instructions and index registers were too expensive for small
    machines. The IBM 70x mainframes had index registers, the early DEC
    PDP series didn't other than the mainframe-esque PDP-6 and -10. The
    PDP-11 mini was a complete rethink a decade after the PDP-1 with eight registers usable for indexing and no indirect addressing.

    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?

    They were also useful for argument lists which were invariably in
    memory on machines without a lot of registers which was all of them
    before S/360 and the PDP-6. On many machines a Fortran subroutine call
    would leave the return address in an index register and the addresses
    of the arguments were in the words after the call. The routine would
    use something like @3(X) to get the third argument. Nobody other than
    maybe Lisp cared about reentrant or recursive code, and if the number
    of arguments in the call didn't match the number the routine expected
    and your program blew up, well, don't do that.

    As you suggested, a lot of uses boiled down to providing a fixed
    address for something that can move, so instructions could indirect
    through that fixed address without having to load it into a register.

    For most purposes, index registers do indirection better, and now that everything has a lot of registers, you can use some of them for the fixed->movable stuff like the GOT in Unix/linux shared libraries.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Anton Ertl on Fri Dec 29 20:27:29 2023
    Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

    The major question I have is why these architectures have this

    Solves the memory access problem {arrays, nested arrays, linked lists,...}
    The early machines had "insufficient" address generation means, and used indirection as a trick to get around their inefficient memory address mode.

    The only use I can come up with for the arbitrarily repeated
    indirection is the implementation of logic variables in Prolog.
    However, Prolog was first implemented in 1970, and it did not become a
    big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

    Some of the indirection machines had indirection-bit located in the
    container at the address generated, others had the indirection in
    the address calculation. In the case of the PDP-10 there was a time-
    out counter and there were applications that worked fine up to a
    particular size, and then simply failed when the indirection watch
    dog counter kept "going off".

    A use for a single indirection is the implementation of the memory
    management in the original MacOS: Each dynamically allocated memory
    block was referenced only from a single place (its handle), so that
    the block could be easily relocated. Only the address of the handle
    was freely passed around, and accessing the block then always required
    double indirection. MacOS was implemented on the 68000, which did not
    have the indirect bit; this demonstrates that the indirect bit is not necessary for that. Nevertheless, such a usage pattern might be seen
    as a reason to add the indirect bit. But is it enough?

    Two things: 1) the indirect bit is insufficient, 2) optimizing compilers
    got to the point they were better at dereferencing linked lists than
    the indirection machines were. {Reuse and all that rot.}

    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?

    Arrays, matrixes, scatter, gather, lists, queues, stacks, arguments,....
    We did all sorts of infinite-indirect stuff in asm on the PDP-10 {KI}
    when programming at college.

    They went out of fashion when compilers got to the point they could
    hold the intermediate addresses in registers and short circuit the
    amount of indirection needed--improving performance due to accessing
    fewer memory locations.

    The large register files of RISC spelled their doom.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    In the machines where the indirection is at the instruction level, this
    was simple, in the machines where the indirection was at the target, it
    was more difficult.

    - anton


    First the architects thought registers were expensive.
    {Many doubled down by OP-Mem ISAs.}
    The architects endowed memory addressing with insufficient capabilities.
    {Many to satisfy the OP-Mem and Mem-OP ISA they had imposed upon themselves} Then they added indirection to make up for insufficient addressing.
    And then everyone waited until RISC showed up (1980) before realizing their error in register counts.
    {Along about this time, Compilers started getting good.}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Dec 29 21:59:25 2023
    According to MitchAlsup <mitchalsup@aol.com>:
    Some of the indirection machines had indirection-bit located in the
    container at the address generated, others had the indirection in
    the address calculation. In the case of the PDP-10 there was a time-
    out counter and there were applications that worked fine up to a
    particular size, and then simply failed when the indirection watch
    dog counter kept "going off".

    No, that's what the GE 635 did, a watchdog timer reset each time it
    started a new instruction. The PDP-6 and -10 could take an interrupt
    each time it calculated an address and would restart the instruction
    when the interrupt returned. This worked because unlike on the 635 the
    address calculation didn't change anything. (Well, except for the ILDB
    and IDPB instructions that needed the first part done flag. But I

    You could tell how long the time between clock interrupts was by
    making an ever longer indirect address chain and seeing where your
    program stalled. It wouldn't crash, it just stalled as the very long
    address chain kept being interrupted and restarted. I'm not being
    hypothetical here.

    Two things: 1) the indirect bit is insufficient, 2) optimizing compilers
    got to the point they were better at dereferencing linked lists than
    the indirection machines were. {Reuse and all that rot.}

    More importantly, index registers are a lot faster than indirect
    addressing and at least since the IBM 801, we have good algorithms to
    do register scheduling.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    In the machines where the indirection is at the instruction level, this
    was simple, in the machines where the indirection was at the target, it
    was more difficult.

    The indirection was always in the address word(s), not in the target.
    It didn't matter if it was a load or a store.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe Pfeiffer@21:1/5 to Anton Ertl on Sat Dec 30 12:26:02 2023
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

    The major question I have is why these architectures have this

    I'll hazard a guess that once you've got the indirect bit out in memory,
    it's easier to just use the same logic on all memory reads than to only
    let it happen once.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Dec 30 23:26:20 2023
    According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:
    I'll hazard a guess that once you've got the indirect bit out in memory,
    it's easier to just use the same logic on all memory reads than to only
    let it happen once.

    That's not how indirect addressing worked.

    There was always a bit in the instruction to say to do indirection.

    Sometimes that was it, sometimes on machines where the word size was
    bigger than the address size, it also looked at some other bit in the
    indirect word to see whether to keep going. On the PDP-8, the words
    were 12 bits and the addresses were 12 bits so there was no room, they
    couldn't have done multilevel indirect if they wanted to.

    As several of us noted, multilevel indirection needed something to
    break loops, while single level didn't. In my experience, multiple
    indirection wasn't very useful, I didn't miss it on the -8, and I
    can't recall using it other than as a gimmick on the PDP-10.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Quadibloc@21:1/5 to Anton Ertl on Sun Dec 31 08:00:14 2023
    On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the word
    is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

    The major question I have is why these architectures have this feature.

    No doubt this answer has already been given.

    The reason these architectures had that feature was because of a feature
    they _didn't_ have: an index register.

    So in order to access arrays and stuff like that, instead of doing surgery
    on the short address inside an instruction, you can simply store a full
    address in a word somewhere that points anywhere you would like.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    Let's assume we do have an architecture that supports multi-level
    indirection. So an instruction word looks like this:


    and an address constant looks like this:


    So in an address constant (some architectures that had index registers
    kept indirection) you could specify indexing too, but now the address was longer by the length of the opcode field.

    If the address inside an instruction is too short to handle all of memory
    (i.e. the word length is less than 24 bits) then you need a "page" bit in
    the instruction: 0 means page zero, shared by the whole program, 1 means
    the current page - the one the instruction is on.

    Let's now say the instruction is a _store_ instruction. Then what? Well,
    if the indirect bit is set, it acts like a *load* instruction, to fetch and load the effective address. It only stores at the point where indirection
    ends - where the address is now of the actual location to do the storing
    in, rather than the location of the effective address, which must be read,
    not written.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to John Levine on Sun Dec 31 17:40:53 2023
    On 30/12/2023 23:26, John Levine wrote:
    and I
    can't recall using it other than as a gimmick on the PDP-10.

    It's a very long time ago, but I'm sure I do recall seeing it used on a DECSystem10 for arrays of pointers for indirection.

    The fact that 40 years later I can remember the @ being used in
    assembler must mean something.

    Modern machines don't like wasting space so much. On the '10 an address
    pointed to was a 36 bit value with an 18 bit address in it. And the
    indirection bit. There was space for things like this.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to MitchAlsup on Sun Dec 31 17:54:44 2023
    MitchAlsup <mitchalsup@aol.com> schrieb:
    Quadibloc wrote:

    On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the word >>> is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I
    understand the descriptions of this feature.

    The major question I have is why these architectures have this feature.

    No doubt this answer has already been given.

    The reason these architectures had that feature was because of a feature
    they _didn't_ have: an index register.

    This is a better explanation than above. Instead of paying the high price needed for index registers, they use main memory as their index registers. {{A lot like building linked lists in FORTRAN 66}}.

    The PDP-10 had both a recursive indirect bit and index registers (aka
    memory locations 1 to 15), if I remember the manuals correctly
    (I did a bit of reading, but I've never even come close to one of
    these machines).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Quadibloc on Sun Dec 31 17:16:35 2023
    Quadibloc wrote:

    On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the word
    is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I
    understand the descriptions of this feature.

    The major question I have is why these architectures have this feature.

    No doubt this answer has already been given.

    The reason these architectures had that feature was because of a feature
    they _didn't_ have: an index register.

    This is a better explanation than above. Instead of paying the high price needed for index registers, they use main memory as their index registers.
    {{A lot like building linked lists in FORTRAN 66}}.

    So in order to access arrays and stuff like that, instead of doing surgery
    on the short address inside an instruction, you can simply store a full address in a word somewhere that points anywhere you would like.

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    Let's assume we do have an architecture that supports multi-level indirection. So an instruction word looks like this:


    and an address constant looks like this:


    So in an address constant (some architectures that had index registers
    kept indirection) you could specify indexing too, but now the address was longer by the length of the opcode field.

    If the address inside an instruction is too short to handle all of memory (i.e. the word length is less than 24 bits) then you need a "page" bit in
    the instruction: 0 means page zero, shared by the whole program, 1 means
    the current page - the one the instruction is on.

    Going all PDP-8 on us now ??

    Let's now say the instruction is a _store_ instruction. Then what? Well,
    if the indirect bit is set, it acts like a *load* instruction, to fetch and load the effective address. It only stores at the point where indirection ends - where the address is now of the actual location to do the storing
    in, rather than the location of the effective address, which must be read, not written.

    John Savard

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Levine on Sun Dec 31 18:25:56 2023
    John Levine <johnl@taugh.com> writes:
    According to Joe Pfeiffer <pfeiffer@cs.nmsu.edu>:
    I'll hazard a guess that once you've got the indirect bit out in memory, >>it's easier to just use the same logic on all memory reads than to only
    let it happen once.

    That's not how indirect addressing worked.

    There was always a bit in the instruction to say to do indirection.

    In our case (B3500 et alia), there was a bit per operand, so a three operand instruction could have all three addresses indirect. The processor treated
    the value at the indirect address as an operand address allowing infinite recursion (subject to a processor timer in case of loops).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Quadibloc on Sun Dec 31 18:28:07 2023
    Quadibloc <quadibloc@servername.invalid> writes:
    On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the word
    is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I
    understand the descriptions of this feature.

    The major question I have is why these architectures have this feature.

    No doubt this answer has already been given.

    The reason these architectures had that feature was because of a feature
    they _didn't_ have: an index register.

    Not necessarily true. The B3500 had three index registers (special
    locations in memory, not real registers). Later systems in the early
    80's added an additional four register-based index registers, but
    continued to support indirect addressing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Paul A. Clayton on Sun Dec 31 18:57:21 2023
    Paul A. Clayton wrote:

    On 12/29/23 2:36 PM, John Levine wrote:
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found.
    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?
    As you suggested, a lot of uses boiled down to providing a fixed
    address for something that can move, so instructions could indirect
    through that fixed address without having to load it into a register.

    Paged virtual memory as commonly implemented introduces one level
    of indirection at page (rather than word) granularity.
    Virtualization systems using nested page tables introduce a second

    Hierarchical/multi-level page tables have multiple layers of
    indirection where instead of a page table base pointer pointing to
    a complete page table it points to a typically-page-sized array of
    address and metadata entries where each entry points to a similar
    array eventually reaching the PTE.

    Even with page table caching (and workloads that play well with
    this kind of virtual memory), this is not free but it can be
    "cheap enough". Using large pages for virtual-physical to physical translation can help a lot. Presumably having an OS bias placement
    of its translation table pages into large quasi-pages would help
    caching for VPA-to-PA, i.e., many VPAs used by the OS for paging
    would be in the same large page (e.g., 2MiB for x86).

    (Andy Glew had suggested using larger pages for intermediate nodes
    rather than limiting such to the last node in a hierarchical page

    I had been thinking that since my large-page translation tables have
    a count of the number of pages, that when forking off a new GuestOS
    that I would allocate the HyperVisor tables as a single 8GB large
    page, and when it needs more then switch to a more treeified page
    table. This leaves the second level of DRAM translation at 1 very
    cacheable and TLB-able PTE--dramatically reducing the table walking

    A single 8GB page mapping can allow access to one 8192B page up to
    1M 8192B pages. Guest OS page tables can map any of these 8192B pages
    to any virtual address it desires with permissions it desires.

    This has the same level-reducing effect of huge pages that short-circuit the translation indirection at the end but allows
    eviction and permission control at base-page size, with the
    consequent larger number of PTEs active if there is spatial
    locality at huge page granularity. Such merely assumes that
    locality potentially exists at the intermediate nodes rather than
    exclusively at the last node. Interestingly, with such a page
    table design one might consider having rather small pages; e.g., a
    perhaps insane 64-byte base page size (at least for the tables)
    would only provide 3 bits per level but each level could be
    flattened to provide 6, 9, 12, etc. bits. Such extreme flexibility
    may well not make sense, but it seems interesting to me.)

    For most purposes, index registers do indirection better, and now that
    everything has a lot of registers, you can use some of them for the
    fixed->movable stuff like the GOT in Unix/linux shared libraries.

    For x86-64 some of the segments can have non-zero bases, so these
    provide an additional index register ("indirection").

    This has more to do with 16 registers being insufficient than indirection (segmentation) being better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Thomas Koenig on Sun Dec 31 18:59:45 2023
    Thomas Koenig wrote:

    MitchAlsup <mitchalsup@aol.com> schrieb:
    Quadibloc wrote:

    On Fri, 29 Dec 2023 17:20:43 +0000, Anton Ertl wrote:

    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the word >>>> is set, the CPU uses that word as the address to access, and this
    repeats until a word without this bit is found. At least that's how I >>>> understand the descriptions of this feature.

    The major question I have is why these architectures have this feature.

    No doubt this answer has already been given.

    The reason these architectures had that feature was because of a feature >>> they _didn't_ have: an index register.

    This is a better explanation than above. Instead of paying the high price
    needed for index registers, they use main memory as their index registers. >> {{A lot like building linked lists in FORTRAN 66}}.

    The PDP-10 had both a recursive indirect bit and index registers (aka
    memory locations 1 to 15), if I remember the manuals correctly
    (I did a bit of reading, but I've never even come close to one of
    these machines).

    All of the PDP-10s at CMU had the register upgrade. {2×Ki and 1×Kl}
    I believe that most PDP-10 ever sold had the register upgrade.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sun Dec 31 20:19:32 2023
    According to Thomas Koenig <tkoenig@netcologne.de>:
    The PDP-10 had both a recursive indirect bit and index registers (aka
    memory locations 1 to 15), if I remember the manuals correctly
    (I did a bit of reading, but I've never even come close to one of
    these machines).

    Yup. Each instruction had an 18 bit address, a four bit index register, and an indirect bit.
    It took the address, and added the contents of the right half of the index register if non-zero.
    If the indirect bit was off, that was the operand address. If the indirect bit was set, it
    fetched the word at that location and did the whole thing over again, including the indexing.

    You could in principle create extremely complicated address chanis but
    it was so confusing that nobody did.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to John Levine on Sun Dec 31 20:42:42 2023
    John Levine wrote:

    According to Thomas Koenig <tkoenig@netcologne.de>:
    The PDP-10 had both a recursive indirect bit and index registers (aka >>memory locations 1 to 15), if I remember the manuals correctly
    (I did a bit of reading, but I've never even come close to one of
    these machines).

    Yup. Each instruction had an 18 bit address, a four bit index register, and an indirect bit.
    It took the address, and added the contents of the right half of the index register if non-zero.
    If the indirect bit was off, that was the operand address. If the indirect bit was set, it
    fetched the word at that location and did the whole thing over again, including the indexing.

    You could in principle create extremely complicated address chanis but
    it was so confusing that nobody did.

    At CMU is used this a lot for things like symbol table searches.
    What I did not use was the index register stuff of the indirection (except
    at the first level).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sarr.blumson@alum.dartmouth.org@21:1/5 to John Levine on Mon Jan 1 20:31:28 2024
    John Levine <johnl@taugh.com> wrote:

    : More importantly, index registers are a lot faster than indirect
    : addressing and at least since the IBM 801, we have good algorithms to
    : do register scheduling.

    Once upon a time saving an instruction was a big deal; the 801, and
    RISC in general, was possible because memory got much cheaper.
    Using index registers costs an extra instrucion for loading the index

    Index registers were a scarce resource too (except for the Atlas) so
    keeping all your pointers in index registers wasn't a good option


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to sarr.blumson@alum.dartmouth.org on Thu Jan 4 01:36:35 2024
    sarr.blumson@alum.dartmouth.org wrote:

    John Levine <johnl@taugh.com> wrote:

    : More importantly, index registers are a lot faster than indirect
    : addressing and at least since the IBM 801, we have good algorithms to
    : do register scheduling.

    Once upon a time saving an instruction was a big deal; the 801, and
    RISC in general, was possible because memory got much cheaper.
    Using index registers costs an extra instrucion for loading the index register.

    Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions
    as VAX and at 6× the frequency for a 4× improvement in performance.

    Now, imagine a RISC ISA that only needs 1.1× as many instructions as
    VAX with no degradation WRT operating frequency.

    Index registers were a scarce resource too (except for the Atlas) so
    keeping all your pointers in index registers wasn't a good option


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Anton Ertl on Fri Jan 5 11:33:40 2024
    Anton Ertl wrote:
    Some (many?) architectures of the 1960s (earlier? later?) have the
    feature that, when loading from an address, if a certain bit in the
    word is set, the CPU uses that word as the address to access, and this repeats until a word without this bit is found. At least that's how I understand the descriptions of this feature.

    The major question I have is why these architectures have this

    The only use I can come up with for the arbitrarily repeated
    indirection is the implementation of logic variables in Prolog.
    However, Prolog was first implemented in 1970, and it did not become a
    big thing until the 1980s (if then), so I doubt that this feature was implemented for Prolog.

    A use for a single indirection is the implementation of the memory
    management in the original MacOS: Each dynamically allocated memory
    block was referenced only from a single place (its handle), so that
    the block could be easily relocated. Only the address of the handle
    was freely passed around, and accessing the block then always required
    double indirection. MacOS was implemented on the 68000, which did not
    have the indirect bit; this demonstrates that the indirect bit is not necessary for that. Nevertheless, such a usage pattern might be seen
    as a reason to add the indirect bit. But is it enough?

    Were there any other usage patterns? What happened to them when the
    indirect bit went out of fashion?

    One other question is how the indirect bit works with stores. How do
    you change the first word in the chain, the last one, or any word in

    - anton

    PDP-11 and VAX had multiple address modes with a single level of indirection. The VAX usage stats from 1984 show about 3% use on SPEC.

    DG Nova had infinite indirection - if the Indirect bits was set in the instruction then in the address register if the msb of the address was zero then it was the address of the 16-bit data, if the msb of the address was 1 then it was the address of another address, looping until msb = 0.
    I don't know how DG used it but, just guessing, because Nova only had
    4 registers might be to create a kind of virtual register set in memory.

    The best use I have for single level indirection is compilers & linkers.
    The compiler emits a variable reference without knowing if it is local
    to the linkage unit or imported from a DLL. Linker discovers it is a
    DLL export variable and changes the assigned variable to be a pointer
    to the imported value that is patched by the loader,
    and just flips the Indirect bit on the instruction.

    Doing the same thing without address indirection requires inserting
    extra LD instructions and having a spare register allocated to the
    linker to work with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Jan 5 18:05:21 2024
    According to EricP <ThatWouldBeTelling@thevillage.com>:
    PDP-11 and VAX had multiple address modes with a single level of indirection. >The VAX usage stats from 1984 show about 3% use on SPEC.

    The main place the PDP-11 used indirect addressing was in @(PC)+ which
    was the idiom for absolute addressing. It fetched the next word in the instruction stream as an immediate via (PC)+ and then used it as an
    address via indirection. The assembler let you write @#123 to geerate
    that address mode and put the 123 in line.

    It was also useful for threaded code, where you had a register,
    typically R4, pointing at a list of routine addresses and dispatched
    with JMP @(R4)+

    If you were feeling clever you could do this coroutine switch JSR PC,@(SP)+

    That popped the top word off the stack, then pushed the current PC, then jumped to the address it had popped.

    DG Nova had infinite indirection - if the Indirect bits was set in the >instruction then in the address register if the msb of the address was zero >then it was the address of the 16-bit data, if the msb of the address was 1 >then it was the address of another address, looping until msb = 0.
    I don't know how DG used it but, just guessing, because Nova only had
    4 registers might be to create a kind of virtual register set in memory.

    My guess is that it was cheap to implement and let them say look, here
    is a cool thing that we do and DEC doesn't. I would be surprised if
    there were many long indirect chains.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to John Levine on Fri Jan 5 23:20:45 2024
    John Levine wrote:

    According to EricP <ThatWouldBeTelling@thevillage.com>:
    PDP-11 and VAX had multiple address modes with a single level of indirection. >>The VAX usage stats from 1984 show about 3% use on SPEC.

    The main place the PDP-11 used indirect addressing was in @(PC)+ which
    was the idiom for absolute addressing. It fetched the next word in the instruction stream as an immediate via (PC)+ and then used it as an
    address via indirection. The assembler let you write @#123 to geerate
    that address mode and put the 123 in line.

    It was also useful for threaded code, where you had a register,
    typically R4, pointing at a list of routine addresses and dispatched
    with JMP @(R4)+

    If you were feeling clever you could do this coroutine switch JSR PC,@(SP)+

    That popped the top word off the stack, then pushed the current PC, then jumped
    to the address it had popped.

    I used this in a real-timeOS I developed at CMU to deal with laser power control.

    Processes (no MMU or protection) would receive control JSR PC,@(SP)+ and
    return control with JSR PC,@(SP)+ at which time OS would find the next thing to do and JSR PC,@(SP)+ all over again. Really light weight context switching.

    DG Nova had infinite indirection - if the Indirect bits was set in the >>instruction then in the address register if the msb of the address was zero >>then it was the address of the 16-bit data, if the msb of the address was 1 >>then it was the address of another address, looping until msb = 0.
    I don't know how DG used it but, just guessing, because Nova only had
    4 registers might be to create a kind of virtual register set in memory.

    My guess is that it was cheap to implement and let them say look, here
    is a cool thing that we do and DEC doesn't. I would be surprised if
    there were many long indirect chains.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Wed Jan 17 05:28:46 2024
    On Fri, 29 Dec 2023 19:04:56 GMT, Scott Lurndal wrote:

    The [Burroughs] system ran mostly COBOL code (with some BPL;
    assemblers weren't generally provided to customers).

    For an interesting reason: privilege protection was enforced in software,
    not hardware.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 08:14:34 2024
    Lawrence D'Oliveiro wrote:
    On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

    Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
    VAX and at 6× the frequency for a 4× improvement in performance.

    Mmm, maybe you got the last two multipliers the wrong way round?

    No, that seems correct: It needed 1.5 times as many instructions, so the
    6X frequency must be divided by 1.5 for a final speedup of 4X?


    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Wed Jan 17 06:36:41 2024
    On Sun, 31 Dec 2023 17:54:44 -0000 (UTC), Thomas Koenig wrote:

    ... but I've never even come close to one of these

    You could have one, or a software emulation of one, right in front of you,
    just a SIMH install away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to MitchAlsup on Wed Jan 17 06:34:55 2024
    On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

    Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
    VAX and at 6× the frequency for a 4× improvement in performance.

    Mmm, maybe you got the last two multipliers the wrong way round?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 16:02:32 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Fri, 29 Dec 2023 19:04:56 GMT, Scott Lurndal wrote:

    [no assembler shipped to customers]

    The [Burroughs] system ran mostly COBOL code (with some BPL;
    assemblers weren't generally provided to customers).

    For an interesting reason: privilege protection was enforced in software,
    not hardware.

    Actually, that is not the case.

    Burroughs had multiple lines of mainframes: small, medium and large.

    Small systems (b1700/b1800/b1900) had a writeable control store and the instruction set
    would be dynamically loaded when the application was scheduled.

    Medium systems were BCD systems (B[234][5789]xx) (descended from the orignal line
    of Electrodata Datatron systems when Burroughs bought electrodata
    in the mid 1950s). Designed to efficiently run COBOL code. These
    are the systems I was referring to above. They had hardware
    enforced privilege protection.

    Large systems (starting with the B5000/B5500) were stack systems
    running ALGOL and algol deriviative (DCALGOL, NEWP, etc)
    (they also supported COBOL, Fortran, Basic, etc).

    The systems you are thinking about were the Large systems. And
    there were issues with that (a famous paper in the mid 1970s
    showed how to set the 'compiler' flag on any application allowing
    it to bypass security protections - put the application on a
    tape, load it on an IBM system, patch the executable header,
    and restore it on the Burroughs system).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 17:38:51 2024
    Lawrence D'Oliveiro wrote:

    On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

    Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
    VAX and at 6× the frequency for a 4× improvement in performance.

    Mmm, maybe you got the last two multipliers the wrong way round?

    Performance is in millions of instructions per second.

    If the instruction count was 1.0× a 6× frequency would yield 6× gain.

    So, since there were 1.5× as many instructions and 6× as many instructions per
    second, 6 / 1.5 = 4×

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Lawrence D'Oliveiro on Wed Jan 17 13:09:52 2024
    Lawrence D'Oliveiro wrote:
    On Thu, 4 Jan 2024 01:36:35 +0000, MitchAlsup wrote:

    Mark Horowitz stated (~1983) MIPS executes 1.5× as many instructions as
    VAX and at 6× the frequency for a 4× improvement in performance.

    Mmm, maybe you got the last two multipliers the wrong way round?

    VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
    giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
    machine and advertised it as such. But no one had actually measured it.
    When they finally did and found it was 0.5 MIPS they just changed to
    calling that "1 VUP" or "VAX-780 Units of Processing".

    This also showed up in the Dhrystone benchmarks:


    "Another common representation of the Dhrystone benchmark is the
    DMIPS (Dhrystone MIPS) obtained when the Dhrystone score is divided
    by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine)."

    I suppose they should have changed that to DVUPS.

    Stanford MIPS (16 registers) in 1984 ran at 4 MHz with a 5 stage pipeline.
    The paper I'm looking at compares it to a 8 MHz 68000 and has
    Stanford MIPS averaging 5 times faster on their Pascal benchmark.

    The MIPS R2000 with 32 registers launched in 1986 at 8.3, 12.5 and 15 MHz.
    It supposedly could sustain 1 reg-reg ALU operation per clock.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to ThatWouldBeTelling@thevillage.com on Wed Jan 17 19:14:00 2024
    It appears that EricP <ThatWouldBeTelling@thevillage.com> said:
    VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
    giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
    machine and advertised it as such.

    No, they knew how fast it was. It was about as fast as an IBM 370/158
    which IBM rated at 1 MIPS. A Vax instruction could do a lot more than
    a 370 instruction so it wasn't implausible that the performance was
    similar even though the instruction rate was about half.

    When they finally did and found it was 0.5 MIPS they just changed to
    calling that "1 VUP" or "VAX-780 Units of Processing".

    Yeah, they got grief for the MIPS stuff.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Wed Jan 17 14:32:45 2024
    John Levine wrote:
    It appears that EricP <ThatWouldBeTelling@thevillage.com> said:
    VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
    giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
    machine and advertised it as such.

    No, they knew how fast it was. It was about as fast as an IBM 370/158
    which IBM rated at 1 MIPS.

    So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

    A Vax instruction could do a lot more than
    a 370 instruction so it wasn't implausible that the performance was
    similar even though the instruction rate was about half.

    And they define 1 VUP = 1 TOUP

    When they finally did and found it was 0.5 MIPS they just changed to
    calling that "1 VUP" or "VAX-780 Units of Processing".

    Yeah, they got grief for the MIPS stuff.

    One just has to be careful comparing clock MIPS and VUPS.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Jan 17 19:55:11 2024
    According to EricP <ThatWouldBeTelling@thevillage.com>:
    John Levine wrote:
    It appears that EricP <ThatWouldBeTelling@thevillage.com> said:
    VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction
    giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
    machine and advertised it as such.

    No, they knew how fast it was. It was about as fast as an IBM 370/158
    which IBM rated at 1 MIPS.

    So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

    If you want. IBM mainframe MIPS was a well understood performance
    measure at the time. In the mid 1970s, there were a few IBM clones
    like Amdahl, but the other mainframe makers were already sinking into obscurity. I can't think of anyone else making a 32 bit byte
    addressable mainframe at the time that wasn't an IBM clone. I suppose
    there were the Interdata machines but they were minis and sold mostly
    for embedded realtime.

    A Vax instruction could do a lot more than
    a 370 instruction so it wasn't implausible that the performance was
    similar even though the instruction rate was about half.

    And they define 1 VUP = 1 TOUP

    Yes, but a TOUP really was an IBM MIPS.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Wed Jan 17 15:19:05 2024
    John Levine wrote:
    According to EricP <ThatWouldBeTelling@thevillage.com>:
    John Levine wrote:
    It appears that EricP <ThatWouldBeTelling@thevillage.com> said:
    VAX-780 was 5 MHz, 200 ns clock and averaged 10 clocks per instruction >>>> giving 0.5 MIPS. When it first came out they thought it was a 1 MIPS
    machine and advertised it as such.
    No, they knew how fast it was. It was about as fast as an IBM 370/158
    which IBM rated at 1 MIPS.
    So those were TOUPS or Three-seventy One-fifty-eight Units of Performance.

    If you want. IBM mainframe MIPS was a well understood performance
    measure at the time. In the mid 1970s, there were a few IBM clones
    like Amdahl, but the other mainframe makers were already sinking into obscurity. I can't think of anyone else making a 32 bit byte
    addressable mainframe at the time that wasn't an IBM clone. I suppose
    there were the Interdata machines but they were minis and sold mostly
    for embedded realtime.

    A Vax instruction could do a lot more than
    a 370 instruction so it wasn't implausible that the performance was
    similar even though the instruction rate was about half.
    And they define 1 VUP = 1 TOUP

    Yes, but a TOUP really was an IBM MIPS.

    Ok, but VAX-780 really was measured by DEC at 0.5 MIPS.
    So either the assumption that a VUP = TOUP was wrong
    or the assumption that a TOUP = MIPS was.

    See section 5 and table 8.

    Characterization of Processor Performance in the VAX-11/780, 1984 http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Jan 17 20:27:45 2024
    According to EricP <ThatWouldBeTelling@thevillage.com>:
    No, they knew how fast it was. It was about as fast as an IBM 370/158 >>>> which IBM rated at 1 MIPS.
    So those were TOUPS or Three-seventy One-fifty-eight Units of Performance. >>
    If you want. IBM mainframe MIPS was a well understood performance
    measure at the time. In the mid 1970s, there were a few IBM clones
    like Amdahl, but the other mainframe makers were already sinking into
    obscurity. I can't think of anyone else making a 32 bit byte
    addressable mainframe at the time that wasn't an IBM clone. I suppose
    there were the Interdata machines but they were minis and sold mostly
    for embedded realtime.

    A Vax instruction could do a lot more than
    a 370 instruction so it wasn't implausible that the performance was
    similar even though the instruction rate was about half.
    And they define 1 VUP = 1 TOUP

    Yes, but a TOUP really was an IBM MIPS.

    Ok, but VAX-780 really was measured by DEC at 0.5 MIPS.
    So either the assumption that a VUP = TOUP was wrong
    or the assumption that a TOUP = MIPS was.

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions. In that era, MIPS meant
    either a million IBM instructions, or as some wag put it, Meaningless Indication of Processor Speed.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Thomas Koenig on Wed Jan 17 18:14:06 2024
    Thomas Koenig wrote:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    I was thinking the same thing. VAX address modes like auto-increment
    would be equivalent to 2 instructions for each operand and likely used
    in benchmarks.

    VAX having 32-bit immediates and offsets and and 64-bit float immediates
    per operand vs 370 having to build constants or load them.

    And POLY for transcendentals is one instruction.

    All of those would add clocks to the VAX instruction execute time
    but not its instruction count and MIPS.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Levine on Wed Jan 17 22:37:00 2024
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to EricP on Wed Jan 17 23:56:24 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    Thomas Koenig wrote:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    I was thinking the same thing. VAX address modes like auto-increment
    would be equivalent to 2 instructions for each operand and likely used
    in benchmarks.

    MOVC3 and MOVC5, perhaps?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Wed Jan 17 23:55:45 2024
    Thomas Koenig wrote:

    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    Fancy addressing modes {indirection, pre decrement, post increment,
    Constants, Displacements, index, ADD-CMP-
    Branch, CRC, Bit manipulation, ...)
    You could say these contribute to most of the gain

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Scott Lurndal on Thu Jan 18 00:24:38 2024
    On Wed, 17 Jan 2024 23:56:24 GMT, Scott Lurndal wrote:

    MOVC3 and MOVC5, perhaps?

    Interruptible instructions ... wot fun ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to tkoenig@netcologne.de on Thu Jan 18 02:05:30 2024
    It appears that Thomas Koenig <tkoenig@netcologne.de> said:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    I don't think anyone used the fancy addressing modes or complex
    instructions much. But here's an example. Let's say A, B, and C are
    floats in addressable memory and you want to do A = B + C

    370 code

    LE R0,B
    AE R0,C
    STE R0,A

    VAX code

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than 3.

    Or that old Fortran favorite I = I + 1

    370 code

    L R1,I
    LA R2,1
    AR R1,R2
    ST R1,I

    VAX code

    INCL I

    or if you have a lousy optimizer

    ADDL2 #1,I

    or if you have a really lousy optimizer

    ADDL3 #1,I,I

    It's still one instruction rather than four.

    In 370 code you often also needed extra instructions to make data
    addressable since it had no direct addressing and address offsets in instructions were only 12 bits.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Thu Jan 18 04:51:09 2024
    On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than 3.

    If those were register operands, that instruction would be 4 bytes.

    I think worst case, each operand could have an index register and a 4-byte offset (in addition to the operand specifier byte), for a maximum
    instruction length of 19 bytes.

    So, saying “just one instruction” may not sound as good as you think.

    Here’s an old example, from the VMS kernel itself. This instruction

    PUSHR #^M<R0,R1,R2,R3,R4,R5>

    pushes the first 6 registers onto the stack, and occupies just 2 bytes.
    Whereas this sequence

    PUSHL R5
    PUSHL R4
    PUSHL R3
    PUSHL R2
    PUSHL R1
    PUSHL R0

    does the equivalent thing, but takes up 2 × 6 = 12 bytes.

    Guess which is faster?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Thu Jan 18 10:08:47 2024
    Lawrence D'Oliveiro wrote:
    On Wed, 17 Jan 2024 23:56:24 GMT, Scott Lurndal wrote:

    MOVC3 and MOVC5, perhaps?

    Interruptible instructions ... wot fun ...

    REP MOVS is the classic x86 example: Since all register usage is fixed (r)si,(r)di,(r)cx the cpu can always accept an interrupt at any point,
    it just needs to update those three registers and take the interrupt.

    When the instruction resumes, any remaining moves are performed.

    This was actually an early 8086/8088 bug: If you had multiple prefix
    bytes, like you would need if you were moving data to the Stack segment
    instead of the Extra, and the encoding was REP SEGSS MOVS, then only hte
    last prefix byte was remembered in the saved IP/PC value.

    I used to check for this bug by moving a block which was large enough
    that it took over 55ms, so that a timer interrupt was guaranteed:

    If the CX value wasn't zero after the instruction, then the bug had


    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Lawrence D'Oliveiro on Thu Jan 18 09:39:55 2024
    Lawrence D'Oliveiro wrote:
    On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than 3.

    If those were register operands, that instruction would be 4 bytes.

    I think worst case, each operand could have an index register and a 4-byte offset (in addition to the operand specifier byte), for a maximum
    instruction length of 19 bytes.

    The longest instruction I think might be an ADD3H with two H format
    16-byte float immediates with an indexed destination with 4 byte offset.

    That should be something like 2 opcode, 1 opspec, 16 imm,
    1 opspec, 16 imm, 1 opspec, 4 imm, 1 index = 42 bytes.

    (Yes its a silly instruction but legal.)

    So, saying “just one instruction” may not sound as good as you think.

    Here’s an old example, from the VMS kernel itself. This instruction

    PUSHR #^M<R0,R1,R2,R3,R4,R5>

    pushes the first 6 registers onto the stack, and occupies just 2 bytes. Whereas this sequence

    PUSHL R5
    PUSHL R4
    PUSHL R3
    PUSHL R2
    PUSHL R1
    PUSHL R0

    does the equivalent thing, but takes up 2 × 6 = 12 bytes.

    Guess which is faster?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Thu Jan 18 09:19:44 2024
    John Levine wrote:
    It appears that Thomas Koenig <tkoenig@netcologne.de> said:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.
    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    I don't think anyone used the fancy addressing modes or complex
    instructions much. But here's an example. Let's say A, B, and C are
    floats in addressable memory and you want to do A = B + C

    370 code

    LE R0,B
    AE R0,C
    STE R0,A

    VAX code

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than 3.

    Or that old Fortran favorite I = I + 1

    370 code

    L R1,I
    LA R2,1
    AR R1,R2
    ST R1,I

    VAX code

    INCL I

    or if you have a lousy optimizer

    ADDL2 #1,I

    or if you have a really lousy optimizer

    ADDL3 #1,I,I

    It's still one instruction rather than four.

    In 370 code you often also needed extra instructions to make data
    addressable since it had no direct addressing and address offsets in instructions were only 12 bits.

    VAX Fortran77 could optimize a DO loop array index to an autoincrement,
    I think they called it strength reduction of loop induction variables.

    do i = 1, N
    A(i) = A(i) + B(i)
    end do

    ADDD (rB)+, (rA)+

    VAX usage stats for compilers Basic, Bliss, Cobol, Fortran, Pascal, PL1,
    show usage frequency per operand specifier of autoincrement ~4%, index ~7% except Basic has 17% for autoincrement.
    There is almost no usage of deferred addressing (address of address of data).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to EricP on Thu Jan 18 16:24:13 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    The MIPS R2000 with 32 registers launched in 1986 at 8.3, 12.5 and 15 MHz.
    It supposedly could sustain 1 reg-reg ALU operation per clock.

    It could do at most one instruction per clock, and it certainly needed
    to branch at some point, so no sustained 1/clock ALU instructions.
    Also, a useful program would want to load or store at some point, so
    even less ALU instructions. And with cache misses, also fewer than 1

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jan 18 16:31:29 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than 3.

    If those were register operands, that instruction would be 4 bytes.

    I think worst case, each operand could have an index register and a 4-byte >offset (in addition to the operand specifier byte), for a maximum
    instruction length of 19 bytes.

    So, saying “just one instruction” may not sound as good as you think.

    I wasn't saying they were always better, just pointing out that there
    were straightforward reasons that 500K VAX instructions could do the
    same work as 1M 370 instructions.

    Considering that the 370 is still alive and the VAX died decades ago,
    it should be evident that instruction count isn't a very useful
    metric across architectures.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Thomas Koenig on Thu Jan 18 16:35:45 2024
    Thomas Koenig <tkoenig@netcologne.de> writes:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.

    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    It's interesting that these are the features you are thinking of,
    especially because the IBM 801 research and the RISC research showed
    that fancy addressing modes are rarely used. Table 4 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> shows that addressing modes that the S/360 or even MIPS does not
    support are quite rare:

    Auto-inc. (R)+ 2.1
    Disp. Deferred @D(R) 2.7
    Absolute @(PC) 0.6
    Auto-inc.def. @(R)+ 0.3
    Auto-dec. -(R) 0.9

    for a total of 6.6% of the operand specifiers; there are about 1.5
    operand specifiers per instruction (Table 3), so that's ~0.1 operand
    specifier with a fancy addressing mode per instruction.

    Back to why S/360 has more instructions than VAX, John Levine gave a
    good answer.

    One aspect (partially addressed by John Levine, but not discussed
    explicitly) is that the VAX is a three-address machine, while S/360 is
    a two-address machine, so the S/360 occasionally needs reg-reg moves
    where VAX does not. Plus, S/360 usually requires one of its two
    operands to be a register, so in some cases an additional load is
    necessary on the S/360 that is not needed on the VAX.

    Among the complex VAX instructions CALL/RET and multi-register push
    and pop constiture 3.22% of the instructions according to Table 1 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
    I expect that these correspond to multiple instructions on the S/360.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to Anton Ertl on Thu Jan 18 13:07:52 2024
    Anton Ertl wrote:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    John Levine <johnl@taugh.com> schrieb:

    As I think I said above. a million IBM instructions did about as much
    work as half a million VAX instructions.
    Why the big difference? Were fancy addressing modes really used so
    much? Or did the code for the VAX mostly run POLY instructions? :-)

    It's interesting that these are the features you are thinking of,
    especially because the IBM 801 research and the RISC research showed
    that fancy addressing modes are rarely used. Table 4 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> shows that addressing modes that the S/360 or even MIPS does not
    support are quite rare:

    Auto-inc. (R)+ 2.1
    Disp. Deferred @D(R) 2.7
    Absolute @(PC) 0.6
    Auto-inc.def. @(R)+ 0.3
    Auto-dec. -(R) 0.9

    for a total of 6.6% of the operand specifiers; there are about 1.5
    operand specifiers per instruction (Table 3), so that's ~0.1 operand specifier with a fancy addressing mode per instruction.

    Back to why S/360 has more instructions than VAX, John Levine gave a
    good answer.

    One aspect (partially addressed by John Levine, but not discussed
    explicitly) is that the VAX is a three-address machine, while S/360 is
    a two-address machine, so the S/360 occasionally needs reg-reg moves
    where VAX does not. Plus, S/360 usually requires one of its two
    operands to be a register, so in some cases an additional load is
    necessary on the S/360 that is not needed on the VAX.

    Among the complex VAX instructions CALL/RET and multi-register push
    and pop constiture 3.22% of the instructions according to Table 1 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
    I expect that these correspond to multiple instructions on the S/360.

    - anton

    There is also a different paper with slightly different stats that,
    amonst other things, shows address mode usage by compiled language.

    A Case Study of VAX-11 Instruction Set Usage For Compiler Execution
    Wiecek, 1982

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jan 18 19:08:47 2024
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>: ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >shows that addressing modes that the S/360 or even MIPS does not
    support are quite rare:

    Auto-inc. (R)+ 2.1
    Disp. Deferred @D(R) 2.7
    Absolute @(PC) 0.6
    Auto-inc.def. @(R)+ 0.3
    Auto-dec. -(R) 0.9

    That's not entirely fair. The VAX has an immediate address mode that
    could encode constant values from 0 to 63. Both papers said it was
    about 15% so it was definitely a success. The 370 had sort of a split personality, a shotgun marriage of a register scientific machine
    and a memory-to-memory commercial machine. There were a bunch of
    instructions with immediate operands but they all were a one byte
    immediate and a memory location. Hence the extra LA instructions to
    get immediates into registers.

    Both papers said the index mode, which added a scaled register to an
    address computed any other way, was about 6% which was higher than I
    would have expected. The 370 has a similar base+displacement+index
    which I hear is almost never used.

    Among the complex VAX instructions CALL/RET and multi-register push
    and pop constiture 3.22% of the instructions according to Table 1 of ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf>
    I expect that these correspond to multiple instructions on the S/360.

    The VAX had an all singing and dancing CALLS/RET that saved registers
    and set up a stack frame. and a simple JSB/RSB that just pushed the
    return address and jumped. CALLS was extremely slow and did far
    more than was usually needed so for the most part it was only used
    for inter-module calls that had to use the official calling sequence,
    and JSB for everything else.

    The VAX instruction set was overoptimized for code size and a
    simplistic idea of easy programming which meant among other things
    that a fancy instruction was often slower than the equivalent sequence
    of simple instructions, and a lot of the fancy instructions weren't
    used very much.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Anton Ertl on Thu Jan 18 20:56:08 2024
    On Thu, 18 Jan 2024 16:24:13 GMT, Anton Ertl wrote:

    [MIPS] could do at most one instruction per clock, and it certainly
    needed to branch at some point, so no sustained 1/clock ALU

    But it also had delayed branches, so perhaps it could sustain that rate
    across a taken branch?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Thu Jan 18 20:55:15 2024
    On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

    The 370 had sort of a split personality, a shotgun marriage of a
    register scientific machine and a memory-to-memory commercial machine.

    That pins things down quite narrowly as to when it came into being,
    doesn’t it? Up to about that point, “scientific” and “business” computing
    were considered to be separate worlds, needing their own hardware and
    software, and never the twain shall meet.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to EricP on Thu Jan 18 21:01:19 2024
    On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

    do i = 1, N
    A(i) = A(i) + B(i)
    end do

    ADDD (rB)+, (rA)+

    ... set up rA, rB, rI ...
    BRB $9000
    ADDD (rB)+, (rA)+
    SOBGEQ rI, $1000

    Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
    0, the loop body never executes at all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jan 18 22:16:11 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

    The 370 had sort of a split personality, a shotgun marriage of a
    register scientific machine and a memory-to-memory commercial machine.

    That pins things down quite narrowly as to when it came into being,
    doesn’t it? Up to about that point, “scientific” and “business” computing
    were considered to be separate worlds, needing their own hardware and >software, and never the twain shall meet.

    Yes, the whole point of S/360 was to produce a unified architecture that IBM could sell to all of their customers.

    It may have been a shotgun marriage, but it's been a very long lasting one.

    You can still run most S/360 application code unmodified on the latest zSeries.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Thu Jan 18 22:19:04 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

    do i = 1, N
    A(i) = A(i) + B(i)
    end do

    ADDD (rB)+, (rA)+

    ... set up rA, rB, rI ...
    BRB $9000
    ADDD (rB)+, (rA)+
    SOBGEQ rI, $1000

    Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
    0, the loop body never executes at all.

    Ah, that must have been a Fortran 77 or later DO loop. In Fortran 66 the
    loop usually ran once regardless.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Fri Jan 19 07:28:57 2024
    Lawrence D'Oliveiro wrote:
    On Thu, 18 Jan 2024 09:19:44 -0500, EricP wrote:

    do i = 1, N
    A(i) = A(i) + B(i)
    end do

    ADDD (rB)+, (rA)+

    ... set up rA, rB, rI ...
    BRB $9000
    ADDD (rB)+, (rA)+
    SOBGEQ rI, $1000

    Why use SOBGEQ with the branch intead of SOBGTR? So that this way, if N =
    0, the loop body never executes at all.

    This is the kind of tiny loop body where I would have considered
    replacing the initial BRB $9000 with a dummy instruction (like a compare
    reg with immediate) where the immediate value contained the ADDD loop body.

    This assumes of course that such a dummy opcode would (on average) be
    faster than a taken forward branch!


    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Fri Jan 19 08:39:16 2024
    John Levine <johnl@taugh.com> writes:
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>: >><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >>shows that addressing modes that the S/360 or even MIPS does not
    support are quite rare:

    Auto-inc. (R)+ 2.1
    Disp. Deferred @D(R) 2.7
    Absolute @(PC) 0.6
    Auto-inc.def. @(R)+ 0.3
    Auto-dec. -(R) 0.9

    That's not entirely fair. The VAX has an immediate address mode that
    could encode constant values from 0 to 63. Both papers said it was
    about 15% so it was definitely a success. The 370 had sort of a split >personality, a shotgun marriage of a register scientific machine
    and a memory-to-memory commercial machine. There were a bunch of >instructions with immediate operands but they all were a one byte
    immediate and a memory location. Hence the extra LA instructions to
    get immediates into registers.

    So this advantage of the VAX over S/360 was not a "fancy" addressing
    mode, but the immediate addressing mode that S/360 does not have, but
    that all RISCs have, even MIPS, Alpha and RISC-V (except that these architectures define addi/addiu as separate instructions). VAX has
    "short literal", as you explain (15.8% of the operands) as well as
    "immediate" (2.4% of the operands). With 1.5 operands per
    instruction, that alone is a factor 1.27 more instructions for S/360
    than for VAX.

    Among the complex VAX instructions CALL/RET and multi-register push
    and pop constiture 3.22% of the instructions according to Table 1 of >><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >>I expect that these correspond to multiple instructions on the S/360.

    The VAX had an all singing and dancing CALLS/RET that saved registers
    and set up a stack frame. and a simple JSB/RSB that just pushed the
    return address and jumped. CALLS was extremely slow and did far
    more than was usually needed so for the most part it was only used
    for inter-module calls that had to use the official calling sequence,
    and JSB for everything else.

    That probably depends on the compiler. Table 2 of <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> lists 4.5% "subroutine call and return", and 2.4% "procedure call and
    return"; I assume the latter is the all-singing all-dancing CALL and
    RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
    is probably the multi-register push and pop instructions.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Levine on Fri Jan 19 16:23:09 2024
    On Thu, 18 Jan 2024 16:31:29 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Thu, 18 Jan 2024 02:05:30 -0000 (UTC), John Levine wrote:

    ADDF3 B,C,A

    The VAX may not be faster, but that's one instruction rather than

    If those were register operands, that instruction would be 4 bytes.

    I think worst case, each operand could have an index register and a
    4-byte offset (in addition to the operand specifier byte), for a
    maximum instruction length of 19 bytes.

    So, saying “just one instructionâ€_ may not sound as good as you >think.

    I wasn't saying they were always better, just pointing out that there
    were straightforward reasons that 500K VAX instructions could do the
    same work as 1M 370 instructions.

    Considering that the 370 is still alive and the VAX died decades ago,
    it should be evident that instruction count isn't a very useful
    metric across architectures.

    That's not totally fair.
    S/360 permanently reinvents itself. VAX could have done the same, but voluntarily refused.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to John Levine on Fri Jan 19 16:40:19 2024
    On Thu, 18 Jan 2024 22:16:11 -0000 (UTC)
    John Levine <johnl@taugh.com> wrote:

    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Thu, 18 Jan 2024 19:08:47 -0000 (UTC), John Levine wrote:

    The 370 had sort of a split personality, a shotgun marriage of a
    register scientific machine and a memory-to-memory commercial

    That pins things down quite narrowly as to when it came into being, >doesn’t it? Up to about that point, “scientificâ€_ and >“businessâ€_ computing were considered to be separate worlds, >needing their own hardware and software, and never the twain shall

    Yes, the whole point of S/360 was to produce a unified architecture
    that IBM could sell to all of their customers.

    It may have been a shotgun marriage, but it's been a very long
    lasting one.

    Was it?
    Being younger observer from the outside, my impression is that in the
    1st World people stopped using S/360 descendents for "heavy" scientific calculations around 1980. In other parts of the World it lasted few
    years longer, but still no longer than 1990. Use of IBM manframes for
    CAD continued well into 90s and may be even into this century, but CAD
    is not what people called "scientific computing" back when S/360 was

    You can still run most S/360 application code unmodified on the
    latest zSeries.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Jan 19 16:59:33 2024
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    So this advantage of the VAX over S/360 was not a "fancy" addressing
    mode, but the immediate addressing mode that S/360 does not have, but
    that all RISCs have, even MIPS, Alpha and RISC-V (except that these >architectures define addi/addiu as separate instructions). VAX has
    "short literal", as you explain (15.8% of the operands) as well as >"immediate" (2.4% of the operands). With 1.5 operands per
    instruction, that alone is a factor 1.27 more instructions for S/360
    than for VAX.

    Looks that way. IBM apparently noticed it too since S/390 added 16 bit immediate load, compare, add, subtract, and multiply, and zSeries
    added immediate everything, such as add immediate to memory.

    That probably depends on the compiler. Table 2 of ><https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >lists 4.5% "subroutine call and return", and 2.4% "procedure call and >return"; I assume the latter is the all-singing all-dancing CALL and
    RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
    is probably the multi-register push and pop instructions.

    Sounds right. I'm surprised the procedure call numbers were so high.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Fri Jan 19 16:22:22 2024
    Michael S <already5chosen@yahoo.com> writes:
    Being younger observer from the outside, my impression is that in the
    1st World people stopped using S/360 descendents for "heavy" scientific >calculations around 1980. In other parts of the World it lasted few
    years longer, but still no longer than 1990.

    Meanwhile, in my part of the third world (Austria) politicians praised themselves for buying a supercomputer from IBM. Searching for it, I
    find <https://services.phaidra.univie.ac.at/api/object/o:573/get>, and
    on page 2 it tells me that the inauguration of the supercomputer IBM
    3090-400E VF (with two vector processors) happened on March 7, 1989.
    That project was originally limited to two years, but a contract
    signed on 1992-03-19 exteded the run-time and extended the hardware to
    a 6-processor ES/9000 720VF; that extension also included 20
    RS/6000-550, and they found out that the cumulated computing power
    exceeded the one of the vector computer by far. The vector computer
    was uninstalled in January 1995.

    After the RS/6000 cluster they used an Alpha cluster from 1995 to
    2001, and this was replaced in 2001 with a PC-based Linux cluster
    (inaugurated on January 28, 2002) consisting of 160 nodes with an
    Athlon XP 1700+ and 1GB RAM each.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Fri Jan 19 16:43:30 2024
    Michael S <already5chosen@yahoo.com> writes:
    S/360 permanently reinvents itself. VAX could have done the same, but >voluntarily refused.

    Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
    imagine an alternative history where DEC has decided to avoid
    switching to MIPS and Alpha, and where they would have followed up the
    NVAX (which seems to be pipelined, but not superscalar, i.e., like the
    486) with eventually an OoO implementation, and from then on might
    have had an easier time competing with RISCs.

    The question is how many customers would have defected to RISC-based
    systems in the meantime, and if DEC could have survived competition
    from ever more capable PCs that eliminated the RISC workstation
    market and the RISC server market.

    IBM z and i survives because of a legacy of system-specific software
    (written in assembly or using other system-specific features), because
    the additional hardware cost is an acceptable price for being able to
    continue to use this software.

    Many VAX customers were flexible enough to switch to something else
    when VAX was no longer competetive (that's why DEC did the MIPS-based DECstations), so I doubt that DEC would have survived in the
    alternative history I outlined, at least as a significant manufacturer
    rather than a niche manufacturer like Unisys.

    One interesting aspect is that NVAX was only released in 1991, while
    the 486 was released in 1989, and the MIPS R2000 in 1986, so the VAX instruction set did have a cost.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Fri Jan 19 18:28:18 2024
    According to Michael S <already5chosen@yahoo.com>:
    It may have been a shotgun marriage, but it's been a very long
    lasting one.

    Was it?
    Being younger observer from the outside, my impression is that in the
    1st World people stopped using S/360 descendents for "heavy" scientific >calculations around 1980. ...

    It was earlier than that. The 370/195 was IBM's last attempt to build
    a supercomputer, introduced in 1970 and never sold very well. They
    added vector options on later machines which someone must use, since
    they're still on zSeries, but they've never been competitive for
    pure computing.

    The point of a mainframe is that it has a balance between CPU and I/O.
    A PDP-8 had a much faster CPU than a 360/30, but the 360 had an I/O
    channel that connected card readers and printers and tapes and disks
    so it could do data processing work that nobody did on a PDP-8. A
    PDP-8 could also conect to those but each needed an expensive I/O
    interface to attach to the 8's simple I/O bus, so hardly anyone did.

    Mainframes are also designed to be very reliable and maintainable. A
    modern mainframe has dozens of CPUs some of which are only doing
    maintenance oversight and others of which are hot spares that can
    substitute for a failed processor in the middle of an instruction
    stream. They're also designed so the vendor can do maintenance and
    replace subystems while the system is running. People expect them to
    remain up and running constantly for years at a time.

    Apropos another comment that the 360 has evolved but the Vax didn't,
    that is certainly true, since zSeries is about 70% new stuff and
    30% 360 stuff, but the 360 was a much better place to build from.
    It is much easier to build a fast 360 than a fast Vax because
    the instruction set, even with all the zSeries additions, is
    more regular and amenable to pipelining.

    The worst mistake they made from a performance point of view is
    that the architecture says an instruction can modify the next
    instruction and it is supposed to work. (Back in the 1960s
    on machines with 8K of RAM that was not totally silly.) But
    even that hardly matters since the vast majority of code
    runs out of read-only pages where you can't do that.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Michael S on Fri Jan 19 19:58:00 2024
    In article <20240119164019.0000374e@yahoo.com>, already5chosen@yahoo.com (Michael S) wrote:

    Being younger observer from the outside, my impression is that in
    the 1st world people stopped using S/360 descendents for "heavy"
    scientific calculations around 1980.

    Yup. VAXes and other superminis got you a lot more CPU per dollar.

    Use of IBM manframes for CAD continued well into 90s and may be
    even into this century, but CAD is not what people called
    "scientific computing" back when S/360 was conceived.

    Some aspects of it are, but many are not. CAD has very uneven processor
    usage: vast demands for brief periods when regenerating views or models,
    then very little while the designer thinks and adds to the model. Running
    this on a time-shared machine is frustrating, because when a few
    designers need a lot of CPU at the same time, it gets very slow.
    Individual machines keep the designers happier.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Anton Ertl on Fri Jan 19 19:58:00 2024
    In article <2024Jan19.174330@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
    imagine an alternative history where DEC has decided to avoid
    switching to MIPS and Alpha, and where they would have followed up
    the NVAX (which seems to be pipelined, but not superscalar, i.e., like
    the 486) with eventually an OoO implementation, and from then on might
    have had an easier time competing with RISCs.

    The timeline doesn't work. DEC decided to adopt MIPS in 1989, because
    they were loosing market share worryingly quickly. NVAX was released in
    1991, and they'd have had real trouble developing it without the cash
    from MIPS-based systems.


    They opted for Alpha because they felt VAX had enough overheads that it
    would always be at a disadvantage compared to RISC chips. That is less
    obvious now, but that's because of the huge amounts of money that have
    gone into x86 development over the last thirty years. DEC's market for
    VAX systems was much smaller than the market for x86 in 1995-2010.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to John Dallman on Fri Jan 19 20:22:56 2024
    jgd@cix.co.uk (John Dallman) writes:
    In article <20240119164019.0000374e@yahoo.com>, already5chosen@yahoo.com >(Michael S) wrote:

    Being younger observer from the outside, my impression is that in
    the 1st world people stopped using S/360 descendents for "heavy"
    scientific calculations around 1980.

    Yup. VAXes and other superminis got you a lot more CPU per dollar.

    Use of IBM manframes for CAD continued well into 90s and may be
    even into this century, but CAD is not what people called
    "scientific computing" back when S/360 was conceived.

    Some aspects of it are, but many are not. CAD has very uneven processor >usage: vast demands for brief periods when regenerating views or models,
    then very little while the designer thinks and adds to the model. Running >this on a time-shared machine is frustrating, because when a few
    designers need a lot of CPU at the same time, it gets very slow.
    Individual machines keep the designers happier.

    Modern chip development (RTL/Verilog) environments offload the
    compute- and io-bound- jobs to a compute grid with thousands of nodes;
    even the visualization jobs using X11 tunnelling to get back to
    the workstation display when examining waves, for example.

    When you're dealing with billions of gates on a single chip.....

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Levine on Fri Jan 19 15:29:50 2024
    John Levine wrote:
    According to Anton Ertl <anton@mips.complang.tuwien.ac.at>:
    So this advantage of the VAX over S/360 was not a "fancy" addressing
    mode, but the immediate addressing mode that S/360 does not have, but
    that all RISCs have, even MIPS, Alpha and RISC-V (except that these
    architectures define addi/addiu as separate instructions). VAX has
    "short literal", as you explain (15.8% of the operands) as well as
    "immediate" (2.4% of the operands). With 1.5 operands per
    instruction, that alone is a factor 1.27 more instructions for S/360
    than for VAX.

    Looks that way. IBM apparently noticed it too since S/390 added 16 bit immediate load, compare, add, subtract, and multiply, and zSeries
    added immediate everything, such as add immediate to memory.

    That probably depends on the compiler. Table 2 of
    <https://www.eecg.utoronto.ca/~moshovos/ACA06/readings/emer-clark-VAX.pdf> >> lists 4.5% "subroutine call and return", and 2.4% "procedure call and
    return"; I assume the latter is the all-singing all-dancing CALL and
    RET instruction; the missing 0.82% to the 3.22% mentioned in Table 1
    is probably the multi-register push and pop instructions.

    Sounds right. I'm surprised the procedure call numbers were so high.

    I found a set of LINPACK performance results for many different cpus
    from 1983 by Argonne National Laboratory, including 370/158 (they don't
    say which model) and 780. The results show both the execute time and
    MFLOPS so that removes the variability due to definition of "instruction".

    Dongarra has many versions of this paper over the years.
    This is just the one from 1983.

    Performance of Various Computers Using Standard Linear Equations Software
    in a Fortran Environment, Dongarra, 1983 https://dl.acm.org/doi/pdf/10.1145/859551.859555

    For double precision the 158 running compiled code is about 50%
    faster than 780 running "coded BLAS" (hand coded assembler)
    and about 2 times faster than a 780 for compiled code.

    For single precision the 780 is slightly faster for "coded BLAS"
    and the 158 is about 50% faster for compiled code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Fri Jan 19 20:51:31 2024
    On Thu, 18 Jan 2024 16:31:29 -0000 (UTC), John Levine wrote:

    Considering that the 370 is still alive and the VAX died decades ago,
    it should be evident that instruction count isn't a very useful metric
    across architectures.

    The 360/370/xx/3090/yy/zSeries line only survives because of business “legacy” deployments. It was never a performance-oriented architecture (witness the trouncing by CDC). It is long obsolete, and those deployments
    are dwindling, if not circling the plughole.

    VAX was the next step forward in the “supermini” and later “workstation”
    categories, and these were definitely about price-performance. So when
    other better technologies came along, they rendered it obsolete, fairly quickly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Fri Jan 19 20:58:16 2024
    On Fri, 19 Jan 2024 18:28:18 -0000 (UTC), John Levine wrote:

    The point of a mainframe is that it has a balance between CPU and I/O.

    The point of a mainframe was that the CPU was expensive. So a lot of
    effort went into complex I/O controllers that could perform chains of
    multiple transfers before having to come back to the CPU to ask for more

    Such an architecture tends to prioritize high throughput over low latency. Which made it unsuitable for this newfangled “interactive timesharing”
    that began to be popular with the new hardware and software coming from companies like DEC, DG etc.

    Mainframes are also designed to be very reliable and maintainable.

    They did it in a very expensive way, though. Think how Google manages reliability and maintainability today: by having a cluster of half a
    million servers (maybe more by now), each built from the cheapest parts in
    all ways but one--the power supply.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to EricP on Fri Jan 19 16:17:32 2024
    EricP <ThatWouldBeTelling@thevillage.com> writes:
    For single precision the 780 is slightly faster for "coded BLAS"
    and the 158 is about 50% faster for compiled code.

    trivia: jan1979, I was asked to run cdc6600 rain benchmark on
    (engineering) 4341 (before shipping to customers, the engineering 4341
    was clocked about 10% slower than what shipped to customers) for
    national lab that was looking at getting 70 for a compute farm (sort of
    the leading edge of the coming cluster supercomputing tsunami). I also
    ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode and the integrated channel microcode; a 3031 was two 158 engines, one with just
    the 370 microcode and a 2nd with just the integrated channel microcode.

    cdc6600: 35.77secs
    158: 45.64secs
    3031: 37.03secs
    4341: 36.21secs

    ... 158 integrated channel microcode was using lots of processing
    cycles, even when no i/o was going on.

    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Jan 20 02:38:36 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Fri, 19 Jan 2024 18:28:18 -0000 (UTC), John Levine wrote:

    The point of a mainframe is that it has a balance between CPU and I/O.

    The point of a mainframe was that the CPU was expensive. So a lot of
    effort went into complex I/O controllers that could perform chains of >multiple transfers before having to come back to the CPU to ask for more >work.

    On hign end machines, not so much small ones. On the 360/30, the same
    microcode engine ran the CPU and the channel. When the channel was
    working hard, the CPU pretty much stopped.

    Such an architecture tends to prioritize high throughput over low latency.


    Which made it unsuitable for this newfangled “interactive timesharing” >that began to be popular with the new hardware and software coming from >companies like DEC, DG etc.

    Depended on what model of interaction you wanted. If you wanted the computer to respond to each character, DEC machines were good at that since they were designed to do realtime stuff. If you wanted to do line at a time or screen
    at a time interaction, mainframes did that just fine. In 1964 SABRE ran on
    two IBM 7090s and provided snappy responses to 1500 terminals across the U.S.

    I used CP/67 in the early 1970s and it also worked quite well, fast response
    in line at a time mode.
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Sat Jan 20 09:10:00 2024
    jgd@cix.co.uk (John Dallman) writes:
    In article <2024Jan19.174330@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Voluntarily? The VAX 9000 project cost DEC billions. Ok, one can
    imagine an alternative history where DEC has decided to avoid
    switching to MIPS and Alpha, and where they would have followed up
    the NVAX (which seems to be pipelined, but not superscalar, i.e., like
    the 486) with eventually an OoO implementation, and from then on might
    have had an easier time competing with RISCs.

    The timeline doesn't work. DEC decided to adopt MIPS in 1989, because
    they were loosing market share worryingly quickly. NVAX was released in
    1991, and they'd have had real trouble developing it without the cash
    from MIPS-based systems.

    I forgot that in this alternative reality DEC would have killed the
    VAX 9000 project early, leaving them lots of cash for developping
    NVAX. Still, it could easily have been that they would have lost
    customers to the RISC competition until they finally managed to do the

    Would they have gotten those customers back, or would they have lost
    to IA-32/AMD64 anyway? Probably the latter, unless they found a
    business model that allowed them to milk the customer base that was
    tied to VAX while at the same time being cheap enough to compete with
    Intel. They tried to go for that on the Alpha: they used firmware for
    market segmentation between VMS/Digital OSF/1 on the one hand and
    Linux/Windows on the other; and they also offered some relatively
    cheap boards, e.g. with the 21164PC, but those were probably too
    limited to be successful.

    That is less
    obvious now, but that's because of the huge amounts of money that have
    gone into x86 development over the last thirty years. DEC's market for
    VAX systems was much smaller than the market for x86 in 1995-2010.

    For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
    their papers on OoO (with VAX as example) starting in 1985, the
    Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed in
    the market, the relevant timespan was 1995-2005. Intel dropped the
    64-bit IA-32 successor ball and AMD picked it up with the 2003
    releases of Opteron and Athlon64.

    VAX would have been extended to 64 bits some times in the early 1990s
    in the alternative timeline, and DEC would have been tempted to use
    the 64-bit extension for market segmentation, which again could have
    resulted into DEC painting itself into a niche.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Dallman@21:1/5 to Anton Ertl on Sat Jan 20 16:25:00 2024
    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    jgd@cix.co.uk (John Dallman) writes:
    The timeline doesn't work. DEC decided to adopt MIPS in 1989,
    because they were loosing market share worryingly quickly.
    NVAX was released in 1991, and they'd have had real trouble
    developing it without the cash from MIPS-based systems.

    I forgot that in this alternative reality DEC would have killed the
    VAX 9000 project early, leaving them lots of cash for developping
    NVAX. Still, it could easily have been that they would have lost
    customers to the RISC competition until they finally managed to do
    the OoO-VAX.

    For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
    their papers on OoO (with VAX as example) starting in 1985, the
    Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
    in the market, the relevant timespan was 1995-2005. Intel dropped
    the 64-bit IA-32 successor ball and AMD picked it up with the 2003
    releases of Opteron and Athlon64.

    This requires DEC to take notice of those papers and start developing OoO
    quite quickly. They did not do that historically, and they seem to have
    been confident that their way of working would carry on being effective,
    until RISC demonstrated otherwise. This is the timeframe where IBM gave
    up on building mainframes with competitive compute power, and settled for
    them being capable data-movers.

    If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was
    275K transistors the same year, the 40486 was 1.2M transistors in 1989,
    so the transistor budget could be there.

    Would they have gotten those customers back, or would they have lost
    to IA-32/AMD64 anyway? Probably the latter, unless they found a
    business model that allowed them to milk the customer base that was
    tied to VAX while at the same time being cheap enough to compete
    with Intel.

    I had experience from two different market segments of dealing with DEC.
    In the early 1990s, I was working for a company based around MS-DOS
    software. That was running pretty fast on 486 and Pentium machines. We
    had contact with DEC because one of our large customers had DEC as their primary IT supplier, and one of our managers had bought a DEC PC, from a company who realised he was ignorant and unloaded obsolete hardware on
    him at high prices.

    If you weren't a major DEC customer, they were hell to deal with. They
    just didn't do things, even after agreeing to do so. They charged
    ludicrous prices for minor things. We needed a replacement key for the anti-tamper lock on DEC PC, because the chap had lost it. They were free,
    but the delivery charge was about $60, by cab. Getting them to just post
    it took a lengthy argument.

    Getting a replacement Pentium for one that had the FDIV bug required
    compiling a log of weeks of broken promises from the parts centre and
    faxing it to DEC's personnel department, asking for it to be placed on
    the relevant manager's file and considered at his next performance review.
    We couldn't just get one from Intel: the necessary heat sink was
    permanently bonded to the old chip, so we needed a new one with DEC's
    specific heatsink.

    At the customer who had DEC as an IT supplier, DEC staff didn't know
    anything about PCs or MS-DOS. They only knew VMS, which seemed weird and
    arcane to us, but the DEC staff were sure it was infinitely superior, and
    could not explain why. They really did not make DEC seem attractive as a supplier.

    Then I changed jobs in 1995 to a company that supplied software for VAX
    VMS, Alpha VMS, OSF/1 on Alpha and Windows on Alpha. Dealing with DEC
    from there was much better. They were capable, helpful and efficient. But
    they still didn't understand PCs, and Windows NT was effective at running complex software and was far cheaper and more attractive to PC users than

    The OoO VAX alternate history changes a lot of things. It means PRISM
    doesn't start, and the multiple-personality OS concept that became MICA
    may or may not happen. The lack of a PRISM+MICA cancellation means Dave
    Cutler probably doesn't move to Microsoft, and then Windows NT doesn't
    happen, at least not in the same way.

    The Mac still causes a shift to GUIs. If DEC can come up with, or buy in,
    a good one then they may do very well, and Microsoft may not become
    nearly so important. That would reduce the importance of Intel, which
    might mean IA-64 never happens.

    They tried to go for that on the Alpha: they used firmware for
    market segmentation between VMS/Digital OSF/1 on the one hand and Linux/Windows on the other; and they also offered some relatively
    cheap boards, e.g. with the 21164PC, but those were probably too
    limited to be successful.

    Producing software for Alpha Windows was reasonably straightforward, if
    you had well-behaved software written in a HLL that there were compilers
    for. This meant that people who were coming down from the Unix world
    didn't have much trouble. Going upwards from the MS-DOS/Windows world was harder: you couldn't hit the hardware, you had to rewrite any assembler
    code, and FX!32 wasn't quite as good as it was cracked up to be. Alpha
    Windows software was worth producing until about 1998, when its
    performance advantage evaporated.

    VAX would have been extended to 64 bits some times in the early
    1990s in the alternative timeline, and DEC would have been tempted
    to use the 64-bit extension for market segmentation, which again
    could have resulted into DEC painting itself into a niche.

    Yup. Really, you have to get the traditional DEC management to all retire before 1990, and the new management need to be brave /and/ lucky.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Dallman on Sat Jan 20 18:15:43 2024
    jgd@cix.co.uk (John Dallman) writes:
    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
    For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
    their papers on OoO (with VAX as example) starting in 1985, the
    Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
    in the market, the relevant timespan was 1995-2005. Intel dropped
    the 64-bit IA-32 successor ball and AMD picked it up with the 2003
    releases of Opteron and Athlon64.

    This requires DEC to take notice of those papers and start developing OoO >quite quickly.
    If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get >somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was >275K transistors the same year, the 40486 was 1.2M transistors in 1989,
    so the transistor budget could be there.

    Not in a single chip. The CPU die of the Pentium Pro has 5.5M
    transistors and was available in 1995. Nobody else was much earlier
    on OoO, even with the RISC advantage. If DEC had picked up the HPS
    ideas and invented what's missing from there, they might have had the
    OoO VAX as a multi-chip thing in the early 1990s, and maybe gotten it
    on a single chip by 1995. But its performance in the early 1990s
    would have been great, so it could have won back customers.

    Yup. Really, you have to get the traditional DEC management to all retire >before 1990, and the new management need to be brave /and/ lucky.

    Yes, you would basically need to have a whole bunch of managers and
    tech team leaders take a time machine from, say, today, so they know
    where to go, and they still would need to make and enforce good
    decisions to make the company succeed in the long term rather than
    painting itself into a corner by maximizing short-term revenue.

    You story about your experiences with DEC remind me of one statement I
    once read: DEC buy X, and the result is DEC. Compaq buys DEC, and the
    result is DEC (as in, the DEC attitude won over the Compaq attitude).

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Sat Jan 20 19:50:30 2024
    According to John Dallman <jgd@cix.co.uk>:
    Yup. Really, you have to get the traditional DEC management to all retire >before 1990, and the new management need to be brave /and/ lucky.

    DEC never really understood what business they were in. They had a pretty good run selling hardware that was cheap and reliable, with software that was adequate.
    But more often than not it was used with other software, Compuserve's system and Tenex on the -10, and Unix on the -11 and Vax.

    That worked fine while minicomputers were the cheapest way to do small scale computing. Once micros came in, they weren't able to produce chips that competed on their own (as opposed to being slightly cheaper versions of
    their minis) and they deluded themselves that they could lock people in
    with VMS the way IBM did with DOS and OS and AS/400.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Anton Ertl on Sat Jan 20 22:19:51 2024
    On Sat, 20 Jan 2024 18:15:43 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
    You story about your experiences with DEC remind me of one statement I
    once read: DEC buy X, and the result is DEC. Compaq buys DEC, and the
    result is DEC (as in, the DEC attitude won over the Compaq attitude).

    - anton

    But later on HP bought Compaq and eventually the computing side of the
    business became indistinguishable from Compaq. Both DEC and HP parts
    already dissolved. Ex-SGI side still hanging, but likely not for long.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Dallman on Sat Jan 20 21:33:11 2024
    On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman wrote:

    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    ... Dave Cutler probably doesn't move to Microsoft, and then Windows NT doesn't happen, at least not in the same way.

    Imagine if it hadn’t been created by a Unix-hater. But then, Microsoft had already divested themselves of Xenix by then, hadn’t they? So they
    probably didn’t have anyone left who understood the value of Unix.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Michael S on Sat Jan 20 21:50:59 2024
    Michael S <already5chosen@yahoo.com> schrieb:

    Being younger observer from the outside, my impression is that in the
    1st World people stopped using S/360 descendents for "heavy" scientific calculations around 1980.

    I certainly used a /360 descendants (Siemens 7881, then IBM
    3090) for scientific work, but the latter was also often used
    as the front end for the (also S/360 compatible) Fujitsu VP.
    Hmm... looking around a bit, the IBM 3090 I worked on had 150
    MFlops with its vector facility. That was not too bad when it
    was purchased in 1989, but the worksations purchased soon after
    eclipsed it in computing power for the individual user, and the
    vector computers (Fujitsu VP in Karlsruhe) also did so. The IBM
    3090 was used mainly as a front end to the VP.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Sat Jan 20 21:36:37 2024
    On Sat, 20 Jan 2024 19:50:30 -0000 (UTC), John Levine wrote:

    DEC never really understood what business they were in.

    They were a company running by engineers, selling to engineers and others
    who understood technical stuff. That was a great business model from the introduction of the PDP-1 in 1959 up to the coming of RISC and the IBM PC, mid-1980s. That was a pretty good run, until you have to start to think
    about remaking yourself. Which they had trouble doing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lynn Wheeler on Sun Jan 21 15:06:43 2024
    On Fri, 19 Jan 2024 16:17:32 -1000
    Lynn Wheeler <lynn@garlic.com> wrote:

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    For single precision the 780 is slightly faster for "coded BLAS"
    and the 158 is about 50% faster for compiled code.

    trivia: jan1979, I was asked to run cdc6600 rain benchmark on
    (engineering) 4341 (before shipping to customers, the engineering 4341
    was clocked about 10% slower than what shipped to customers) for
    national lab that was looking at getting 70 for a compute farm (sort
    of the leading edge of the coming cluster supercomputing tsunami). I
    also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
    and the integrated channel microcode; a 3031 was two 158 engines, one
    with just the 370 microcode and a 2nd with just the integrated
    channel microcode.

    cdc6600: 35.77secs
    158: 45.64secs
    3031: 37.03secs
    4341: 36.21secs

    ... 158 integrated channel microcode was using lots of processing
    cycles, even when no i/o was going on.

    Did I read it right? Brand new mid-range IBM mainframe barely matched
    15 y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Sun Jan 21 15:19:22 2024
    On Sat, 20 Jan 2024 21:33:11 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman

    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    ... Dave Cutler probably doesn't move to Microsoft, and then
    Windows NT doesn't happen, at least not in the same way.

    Imagine if it hadn’t been created by a Unix-hater. But then,
    Microsoft had already divested themselves of Xenix by then, hadn’t
    they? So they probably didn’t have anyone left who understood the
    value of Unix.

    I see nothing wrong in DC being Unix hater.
    Much much worse that he didn't understand that it is not 1970s any more
    and that in 1990s plug&play support is necessity, including "hot"
    Because of that blind spot, Win9x line, created by people that did
    understand the value of plug&play (Brad Silverberg ? I can't find much
    info about lead 9x architects on the Net), but very problematic
    otherwise, lasted for much longer than it should have been.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From EricP@21:1/5 to John Dallman on Sun Jan 21 08:43:30 2024
    John Dallman wrote:
    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>, anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    jgd@cix.co.uk (John Dallman) writes:
    The timeline doesn't work. DEC decided to adopt MIPS in 1989,
    because they were loosing market share worryingly quickly.
    NVAX was released in 1991, and they'd have had real trouble
    developing it without the cash from MIPS-based systems.
    I forgot that in this alternative reality DEC would have killed the
    VAX 9000 project early, leaving them lots of cash for developping
    NVAX. Still, it could easily have been that they would have lost
    customers to the RISC competition until they finally managed to do
    the OoO-VAX.

    For developing an OoO-VAX the relevant time is 1985-1995 (HPS wrote
    their papers on OoO (with VAX as example) starting in 1985, the
    Pentium Pro appeared in 1995). Of course, for OoO-VAXes to succeed
    in the market, the relevant timespan was 1995-2005. Intel dropped
    the 64-bit IA-32 successor ball and AMD picked it up with the 2003
    releases of Opteron and Athlon64.

    This requires DEC to take notice of those papers and start developing OoO quite quickly. They did not do that historically, and they seem to have
    been confident that their way of working would carry on being effective, until RISC demonstrated otherwise. This is the timeframe where IBM gave
    up on building mainframes with competitive compute power, and settled for them being capable data-movers.

    If DEC go OoO and build an OoO Micro-VAX CPU by about 1988, they can get somewhere. The MicroVAX 78032 of 1985 was 125K transistors; the 80386 was 275K transistors the same year, the 40486 was 1.2M transistors in 1989,
    so the transistor budget could be there.

    There was also the CVAX in 1986, 134,000 transistors (out of 180,000 sites), 2um CMOS, 3 layers interconnect, 90 ns clock, internal 1 kB 2-way ass. cache. Separate FPU coprocessor chip 65,000 transistors.

    But these were only available in systems like 6240, quad SMP processors,
    256 kB L2 cache, up to 256 MB main memory, and up to 6 high speed IO buses,
    in multiple cabinets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Sun Jan 21 16:30:36 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Fri, 19 Jan 2024 16:17:32 -1000
    Lynn Wheeler <lynn@garlic.com> wrote:
    cdc6600: 35.77secs
    158: 45.64secs
    3031: 37.03secs
    4341: 36.21secs
    Did I read it right? Brand new mid-range IBM mainframe barely matched
    15 y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    That depends on the price, and there are also properties like size,
    power consumption and cooling requirements. IBM mainframes were not
    designed for HPC (with a few exceptions); if you wanted that, you
    would have bought a Cray-1 in 1979 when the 4341 appeared.

    There is also the thing about IBM's market: Amdahl said about the (high-performance) ACS-360 <https://people.computing.clemson.edu/~mark/acs_end.html>:

    |Yes, but the company decided not to build it because it would have
    |destroyed the pricing structures. In the first place, it would have
    |forced them to make higher-end machines. But with IBM's pricing
    |structure, the market disappeared by the time performance got to a
    |certain level. Any machine above that in performance or price could
    |only lose money.

    The ACS-360 was cancelled for that reason.

    Also, remember that these were not the 1990s with their extreme
    advances every year; instead, the performance advances were quite a
    bit slower, just like we have seen in the last two decades. And if
    you compare a 2023-vintage Rock 5B (with Cortex-A76 like the Raspi5)
    with a 2008-vintage Core 2 Duo E8400 PC, the Rock 5B is slightly
    slower when running LaTeX, but its also much cheaper, smaller,
    consumes much less power and actually works without a cooler (but we
    provided one nonetheless; the Raspi 5 SoC is made in a less advanced
    process and needs more cooling).

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jan 21 21:27:23 2024
    On Sun, 21 Jan 2024 15:06:43 +0200, Michael S wrote:

    Did I read it right? Brand new mid-range IBM mainframe barely matched 15
    y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    A minute’s silence for the hardware legend that was Seymour Cray.

    And a minute’s jeering at IBM’s FUD campaign to try to put CDC out of business.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Sun Jan 21 21:28:13 2024
    On Sun, 21 Jan 2024 15:19:22 +0200, Michael S wrote:

    On Sat, 20 Jan 2024 21:33:11 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Sat, 20 Jan 2024 16:25 +0000 (GMT Standard Time), John Dallman

    In article <2024Jan20.101000@mips.complang.tuwien.ac.at>,
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    ... Dave Cutler probably doesn't move to Microsoft, and then Windows
    NT doesn't happen, at least not in the same way.

    Imagine if it hadn’t been created by a Unix-hater. But then, Microsoft
    had already divested themselves of Xenix by then, hadn’t they? So they
    probably didn’t have anyone left who understood the value of Unix.

    I see nothing wrong in DC being Unix hater.

    WSL might not have been necessary. Microsoft would not now be struggling
    to offer some semblance of Linux compatibility.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Michael S on Sun Jan 21 21:26:37 2024
    Michael S wrote:

    On Fri, 19 Jan 2024 16:17:32 -1000
    Lynn Wheeler <lynn@garlic.com> wrote:

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    For single precision the 780 is slightly faster for "coded BLAS"
    and the 158 is about 50% faster for compiled code.

    trivia: jan1979, I was asked to run cdc6600 rain benchmark on
    (engineering) 4341 (before shipping to customers, the engineering 4341
    was clocked about 10% slower than what shipped to customers) for
    national lab that was looking at getting 70 for a compute farm (sort
    of the leading edge of the coming cluster supercomputing tsunami). I
    also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
    and the integrated channel microcode; a 3031 was two 158 engines, one
    with just the 370 microcode and a 2nd with just the integrated
    channel microcode.

    cdc6600: 35.77secs
    158: 45.64secs
    3031: 37.03secs
    4341: 36.21secs

    ... 158 integrated channel microcode was using lots of processing
    cycles, even when no i/o was going on.

    Did I read it right? Brand new mid-range IBM mainframe barely matched
    15 y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    Target market for 4341 was not scientific computing, either.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Sun Jan 21 21:51:56 2024
    MitchAlsup1 <mitchalsup@aol.com> schrieb:
    Michael S wrote:

    On Fri, 19 Jan 2024 16:17:32 -1000
    Lynn Wheeler <lynn@garlic.com> wrote:

    EricP <ThatWouldBeTelling@thevillage.com> writes:
    For single precision the 780 is slightly faster for "coded BLAS"
    and the 158 is about 50% faster for compiled code.

    trivia: jan1979, I was asked to run cdc6600 rain benchmark on
    (engineering) 4341 (before shipping to customers, the engineering 4341
    was clocked about 10% slower than what shipped to customers) for
    national lab that was looking at getting 70 for a compute farm (sort
    of the leading edge of the coming cluster supercomputing tsunami). I
    also ran it on 158-3 and 3031. A 370/158 ran both the 370 microcode
    and the integrated channel microcode; a 3031 was two 158 engines, one
    with just the 370 microcode and a 2nd with just the integrated
    channel microcode.

    cdc6600: 35.77secs
    158: 45.64secs
    3031: 37.03secs
    4341: 36.21secs

    ... 158 integrated channel microcode was using lots of processing
    cycles, even when no i/o was going on.

    Did I read it right? Brand new mid-range IBM mainframe barely matched
    15 y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    Target market for 4341 was not scientific computing, either.

    And yet, people used IBM mainframes for scientific computing...

    For example, the IBM 4361 had, as an optional feature, the maximum
    precision scalar product developed by the University of Karlsruhe.

    Not sure why they went to IBM with it, maybe DEC would have been
    a better choice. Then again, the people at the computer center
    in Karlsruhe were very mainframe-oriented...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Lawrence D'Oliveiro on Sun Jan 21 22:01:34 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sun, 21 Jan 2024 15:06:43 +0200, Michael S wrote:

    Did I read it right? Brand new mid-range IBM mainframe barely matched 15
    y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    A minute’s silence for the hardware legend that was Seymour Cray.

    He was a friend of my Godfather (who lived in Chippewa Falls), right around
    the time I first had access to a computer (1974, B5500). I didn't
    realize who he was until much later, however and never had a chance to
    discuss computers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Sun Jan 21 23:54:01 2024
    On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

    Not sure why they went to IBM with it, maybe DEC would have been a
    better choice. Then again, the people at the computer center in
    Karlsruhe were very mainframe-oriented...

    There seemed to be a lot of people like that, who only knew IBM and saw
    the whole world through IBM lenses. To the rest of us, IBM’s way of doing things just seemed overcomplicated, unwieldy, inflexible ... and

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lynn Wheeler@21:1/5 to Michael S on Sun Jan 21 16:37:40 2024
    Michael S <already5chosen@yahoo.com> writes:
    Did I read it right? Brand new mid-range IBM mainframe barely matched
    15 y.o. CDC machine that was 10 years out of production ?
    That sounds quite embarrassing.

    national lab was looking at getting 70 because of price/performance
    ... sort of the leading edge of the coming cluster scale-up
    supercomputing tsunami.

    decade later had project originally HA/6000 for NYTimes to move their
    newspaper system (ATEX) off (DEC) VaxCluster to RS/6000. I rename it
    HA/CMP when I start doing technical/scientific cluster scale-up with
    national labs and commercial cluster scale-up with RDBMS vendors
    (Oracle, Sybase, Informix, Ingres). Early Jan1992, meeting with Oracle
    CEO, who is told 16-way cluster mid-92 and 128-way cluster
    ye-92. However, end of Jan1992, cluster scaleup is transferred for
    announce as IBM supercomputer (for technical/scientific *ONLY*, possibly because of commercial cluster scaleup "threat") and we are told we
    couldn't work on anything with more than four processors (we leave IBM a
    few months later). A couple weeks later, IBM (cluster) supercomputer
    group in the press (pg8) https://archive.org/details/sim_computerworld_1992-02-17_26_7

    First half 80s, IBM 4300s sold into the same mid-range market as VAX and
    in about the same numbers for single and small unit orders ... big
    difference was large companies ordering hundreds of 4300s at a time for
    placing out in departmental areas (sort of the leading edge of the
    coming distributed comuting tsunami).

    old archived post with vax sales, sliced and diced by model, year,

    2nd half of 80s, mid-range market was moving to workstation and large PC servers ... affecting both VAX and 4300s

    virtualization experience starting Jan1968, online at home since Mar1970

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 22 02:46:05 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

    Not sure why they went to IBM with it, maybe DEC would have been a
    better choice. Then again, the people at the computer center in
    Karlsruhe were very mainframe-oriented...

    There seemed to be a lot of people like that, who only knew IBM and saw
    the whole world through IBM lenses. ...

    IBM has a big development lab in Boeblingen which is about an hour from Karlsruhe.

    At that time DEC had no labs outside the United States.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon Jan 22 03:21:21 2024
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from Karlsruhe.

    At one time, IBM were the world’s biggest holder of patents. Their researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to Lawrence D'Oliveiro on Mon Jan 22 09:59:24 2024
    Lawrence D'Oliveiro wrote:
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from

    At one time, IBM were the world’s biggest holder of patents. Their researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    My uni lecturer had a favorite:

    IBM's patent for a zero-time sorting chip.

    It was basically a DMA-style memory device that was setup as a big
    ladder of comparators so that it could do a parallel bubble sort:

    As each new item arrived it would be compared with the current top, and
    the loser would be pushed down to the next ladder level, replacing the
    time which had at the same time lost the comparison at that level.

    By the time all items had been loaded, the top would be the overall
    winner, right?

    You would then reverse the direction, while keeping the comparators
    active, so now you would stream out perfectly sorted items.

    The real problem is of course that this is effectively very expensive
    memory, and as soon as you ran out of space in the chip you would have
    to fall back on multi-way merge between separate runs of chip-size chunks.

    In pretty much every conceivable real-world situation you would much
    rather have 10x more real memory and apply indexing to any data you
    might want to retrieve quickly in some sorted order and/or sort it on


    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to John Levine on Mon Jan 22 11:58:28 2024
    On 05/01/2024 18:05, John Levine wrote:
    According to EricP <ThatWouldBeTelling@thevillage.com>:


    DG Nova had infinite indirection - if the Indirect bits was set in the
    instruction then in the address register if the msb of the address was zero >> then it was the address of the 16-bit data, if the msb of the address was 1 >> then it was the address of another address, looping until msb = 0.
    I don't know how DG used it but, just guessing, because Nova only had
    4 registers might be to create a kind of virtual register set in memory.

    My guess is that it was cheap to implement and let them say look, here
    is a cool thing that we do and DEC doesn't. I would be surprised if
    there were many long indirect chains.

    As has been mentioned elsewhere recently DEC did exactly this on the PDP-10.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 22 16:42:50 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from

    At one time, IBM were the world’s biggest holder of patents. Their >researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 22 16:46:45 2024
    According to Vir Campestris <vir.campestris@invalid.invalid>:
    My guess is that it was cheap to implement and let them say look, here
    is a cool thing that we do and DEC doesn't. I would be surprised if
    there were many long indirect chains.

    As has been mentioned elsewhere recently DEC did exactly this on the PDP-10.

    It was more complicated than that on the PDP-6/10. At each stage it not
    only did indirection, it could also add in an index register. I can sort
    of imagine how one might use all that for dynamically allocated array
    rows but I never saw more than two levels in practice and never saw
    indexing in indirect words.

    In their defense, the addressing was very consistent, start with the instruction word and keep indexing and indirecting until you come up
    with the address.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to John Levine on Mon Jan 22 17:42:19 2024
    John Levine <johnl@taugh.com> schrieb:
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from

    At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    Or you want to be able to use them at a later date, so nobody else
    can patent that particular invention.

    This has led to some patents being filed in Luxemburg only, for example.

    Another method, which is getting harder in the age of search
    engines, is the "secret" publication by publishing it somewhere
    where it is unlikely to be found, such as the (non-existent)
    "Acta Physical Mongolica".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John Levine on Mon Jan 22 17:48:38 2024
    John Levine <johnl@taugh.com> writes:
    [unbounded indirection:]
    It was more complicated than that on the PDP-6/10. At each stage it not
    only did indirection, it could also add in an index register. I can sort
    of imagine how one might use all that for dynamically allocated array
    rows but I never saw more than two levels in practice and never saw
    indexing in indirect words.

    The implementation of a logic variable is a parent-pointer tree where
    you follow the parent pointer pointers until you are at the root
    (which is a free variable or instantiated to a value). The automatic
    unbounded indirection of the PDP-6/10 and Nova appears to be ideal for
    that. And actually the most influential Prolog for quite a number of
    years was DEC-10 Prolog; I don't know if it used that feature, but I
    would be surprised if it did not. Still, Prolog could be implemented
    on architectures without that feature.

    - anton
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Mon Jan 22 17:37:23 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:
    On Sun, 21 Jan 2024 21:51:56 -0000 (UTC), Thomas Koenig wrote:

    Not sure why they went to IBM with it, maybe DEC would have been a
    better choice. Then again, the people at the computer center in
    Karlsruhe were very mainframe-oriented...

    There seemed to be a lot of people like that, who only knew IBM and saw
    the whole world through IBM lenses. To the rest of us, IBM’s way of doing things just seemed overcomplicated, unwieldy, inflexible ... and

    That wasn't the case here.

    The mainframe they had at the computer center before was a UNIVAC
    (don't know which model, it was decommissioned before I started
    on the Siemens/Fujitsu mainframe there), and they had a Cyber 205.

    So, maybe more mainframe-oriented, but not necessarily IBM.
    But then again, the 4361 was not really a mainframe.

    But proximity to of Karlsruhe to Böblingen (which John
    L. mentioned) might well have been a factor. It is entirely
    plausible that contacts existed, for example from students who
    started to work there.

    And, googling around for a bit, I find that the 4361 was indeed
    developed at Böblingen. This probably settles it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Levine on Mon Jan 22 19:03:33 2024
    John Levine wrote:

    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from

    At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    See, you cannot sue me for patent infringement, I am only doing what MY
    patent on that mater allows.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 22 19:22:16 2024
    According to MitchAlsup1 <mitchalsup@aol.com>:
    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    See, you cannot sue me for patent infringement, I am only doing what MY >patent on that mater allows.

    It's more than that. If someone threatens IBM with a patent suit, IBM's
    usual response is that they have 100,000 patents in their portfolio, so
    they're pretty sure that if they look, they will find something that
    the other party is doing that looks like one of those patents and
    will countersue. Patent suits are very expensive and IBM has
    deep pockets.

    Big companies often avoid this by cross-licensing, I won't sue you for
    anything in our pile of patents if you won't sue me for anything in
    your pile.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to mitchalsup@aol.com on Mon Jan 22 20:32:10 2024
    MitchAlsup1 <mitchalsup@aol.com> schrieb:
    John Levine wrote:

    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    On Mon, 22 Jan 2024 02:46:05 -0000 (UTC), John Levine wrote:

    IBM has a big development lab in Boeblingen which is about an hour from >>>> Karlsruhe.

    At one time, IBM were the world’s biggest holder of patents. Their >>>researchers came up with many clever ideas. But my impression was, very >>>few of those ideas actually made it into their products.

    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    See, you cannot sue me for patent infringement, I am only doing what MY patent on that mater allows.

    A patent gives its owner the right to keep others from using the
    invention as described in the claims. It does _not_ give the owner
    any rights to use the invention that he would not have otherwise.

    It is perfectly possible, if undesirable for the patent holder,
    to be dependent on some other patent. It is also possible, if
    rarer, for two patents to block each other, so nobody can use
    the invention. This can then be a reason to negotiate, or
    (in extreme cases) to ligitate.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to John Levine on Mon Jan 22 20:32:30 2024
    John Levine wrote:

    According to MitchAlsup1 <mitchalsup@aol.com>:
    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    See, you cannot sue me for patent infringement, I am only doing what MY >>patent on that mater allows.

    It's more than that. If someone threatens IBM with a patent suit, IBM's usual response is that they have 100,000 patents in their portfolio, so they're pretty sure that if they look, they will find something that
    the other party is doing that looks like one of those patents and
    will countersue. Patent suits are very expensive and IBM has
    deep pockets.

    Big companies often avoid this by cross-licensing, I won't sue you for anything in our pile of patents if you won't sue me for anything in
    your pile.

    Yes, I used the small (startup) patent model.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Mon Jan 22 20:37:48 2024
    On Mon, 22 Jan 2024 16:42:50 -0000 (UTC), John Levine wrote:

    According to Lawrence D'Oliveiro <ldo@nz.invalid>:

    At one time, IBM were the world’s biggest holder of patents. Their >>researchers came up with many clever ideas. But my impression was, very
    few of those ideas actually made it into their products.

    A lot of patents are defensive, you don't necessarily plan to use them
    but you don't want anyone else to own them.

    Here’s one notorious one: they had a patent on the use of bit-flipping to produce a flashing cursor on a text terminal. And they sued other terminal vendors for copying this idea.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Thomas Koenig on Mon Jan 22 20:41:18 2024
    On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

    A patent gives its owner the right to keep others from using the
    invention as described in the claims. It does _not_ give the owner any rights to use the invention that he would not have otherwise.

    And more than that, you don’t actually need to prove your idea works
    before you can get a patent on it. I think the legal term is “reduce to practice”, which basically means “write up a plausible-sounding
    description of how it *might* work”.

    This is why there was nothing to stop people patenting an endless variety
    of ideas for perpetual-motion machines; it needed explicit rules brought
    in specifically to prohibit them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Mon Jan 22 23:12:26 2024
    According to Lawrence D'Oliveiro <ldo@nz.invalid>:
    And more than that, you don’t actually need to prove your idea works
    before you can get a patent on it. I think the legal term is “reduce to >practice”, which basically means “write up a plausible-sounding >description of how it *might* work”.

    This is why there was nothing to stop people patenting an endless variety
    of ideas for perpetual-motion machines; it needed explicit rules brought
    in specifically to prohibit them.

    In the US at least, you are supposed to have reduced your invention to practice although it is obvious that many patentees haven't.

    The patent office is allowed to ask for a working model of any
    invention. Back in the 1800s they got models for everything (a
    fabulous collection that was sadly destroyed first by fires and later
    by auction.) Now they don't except for perpetual motion machines.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Tue Jan 23 06:50:05 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:
    On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

    A patent gives its owner the right to keep others from using the
    invention as described in the claims. It does _not_ give the owner any
    rights to use the invention that he would not have otherwise.

    And more than that, you don’t actually need to prove your idea works
    before you can get a patent on it. I think the legal term is “reduce to practice”, which basically means “write up a plausible-sounding description of how it *might* work”.

    § 21 of the German Patent Law states (deepl-assisted translation,

    (1) The patent shall be revoked (§ 61) if it is found that


    2. the patent does not disclose the invention so clearly and completely
    that a person skilled in the art can carry it out,

    To avoid insufficient disclosure, people (including myself, I have to
    admit) now put a _lot_ of details into patents, which makes the patents
    much longer than previously, and more painful to write and to read.

    This is why there was nothing to stop people patenting an endless variety
    of ideas for perpetual-motion machines; it needed explicit rules brought
    in specifically to prohibit them.

    A patent has to be industrially applicable (§1), and (§5)

    An invention is considered to be industrially applicable if
    its subject matter can be made or used in any industrial field,
    including agriculture.

    Something that does not work can clearly not be used in an
    industrial field.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup1@21:1/5 to Thomas Koenig on Tue Jan 23 21:13:29 2024
    Thomas Koenig wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:
    On Mon, 22 Jan 2024 20:32:10 -0000 (UTC), Thomas Koenig wrote:

    A patent gives its owner the right to keep others from using the
    invention as described in the claims. It does _not_ give the owner any
    rights to use the invention that he would not have otherwise.

    And more than that, you don’t actually need to prove your idea works
    before you can get a patent on it. I think the legal term is “reduce to
    practice”, which basically means “write up a plausible-sounding
    description of how it *might* work”.

    § 21 of the German Patent Law states (deepl-assisted translation,

    (1) The patent shall be revoked (§ 61) if it is found that


    2. the patent does not disclose the invention so clearly and completely
    that a person skilled in the art can carry it out,

    To avoid insufficient disclosure, people (including myself, I have to
    admit) now put a _lot_ of details into patents, which makes the patents
    much longer than previously, and more painful to write and to read.

    This is why there was nothing to stop people patenting an endless variety
    of ideas for perpetual-motion machines; it needed explicit rules brought
    in specifically to prohibit them.

    A patent has to be industrially applicable (§1), and (§5)

    An invention is considered to be industrially applicable if
    its subject matter can be made or used in any industrial field,
    including agriculture.

    Something that does not work can clearly not be used in an
    industrial field.

    When multiple patents arrive at the patent office contemporaneously,
    and they all describe essentially the same mechanism or algorithm::
    they should ALL be denied as something "obvious to one skilled in the

    Yet, the opposite happens.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Tue Jan 23 22:03:24 2024
    On Tue, 23 Jan 2024 21:13:29 +0000, MitchAlsup1 wrote:

    When multiple patents arrive at the patent office contemporaneously, and
    they all describe essentially the same mechanism or algorithm:: they
    should ALL be denied as something "obvious to one skilled in the art".

    Yet, the opposite happens.

    Worse than that, if evidence comes to light of “prior art”, that is, use/ disclosure of the patented techniques prior to the patent registration,
    that should invalidate the patent. Yet, in the US at least, this turns out
    to be very hard.

    Case in point: the NewEgg patent, which was just an application of Diffie- Helmann key exchange. Whitfield Diffie himself took the stand to testify
    that he had come up with the idea decades before. Yet the jury were unconvinced, and let the patent stand.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Lawrence D'Oliveiro on Wed Jan 24 06:52:25 2024
    Lawrence D'Oliveiro <ldo@nz.invalid> schrieb:
    On Tue, 23 Jan 2024 21:13:29 +0000, MitchAlsup1 wrote:

    When multiple patents arrive at the patent office contemporaneously, and
    they all describe essentially the same mechanism or algorithm:: they
    should ALL be denied as something "obvious to one skilled in the art".

    Yet, the opposite happens.

    Worse than that, if evidence comes to light of “prior art”, that is, use/ disclosure of the patented techniques prior to the patent registration,
    that should invalidate the patent. Yet, in the US at least, this turns out
    to be very hard.

    It is then a matter for the opposition division to decide, then the
    board of appeal, then the patent courts (at least that is the EPO

    Case in point: the NewEgg patent, which was just an application of Diffie- Helmann key exchange. Whitfield Diffie himself took the stand to testify
    that he had come up with the idea decades before. Yet the jury were unconvinced, and let the patent stand.

    Was this before or after the US followed the rest of the world by
    allowing opposition proceedings? Having such a case go straight
    to a jury is somewhat problematic...

    But "came up with the idea" is not prior art if it isn't disclosed.
    If he didn't have a publication, or slides from a presentation,
    that does not count.

    Had he said "It was an obvious application that anybody working
    in the field would have thought of with half a brain", that would
    have been a strong argument for lack of inventive step.

    But lack of inventive step is tricky...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Jan 24 14:54:54 2024
    According to Thomas Koenig <tkoenig@netcologne.de>:
    Case in point: the NewEgg patent, which was just an application of Diffie- >> Helmann key exchange. Whitfield Diffie himself took the stand to testify
    that he had come up with the idea decades before. Yet the jury were
    unconvinced, and let the patent stand.

    Was this before or after the US followed the rest of the world by
    allowing opposition proceedings? Having such a case go straight
    to a jury is somewhat problematic...

    It was in 2013. Since 2012 the US has had inter partes review, where
    you can have the Patent Trial and Appeal Board review a patent to see
    if it's not novel. That case was filed under the old rules, and was in
    Marshall TX, a rural corner of Texas with a judge notoriously friendly
    to patent trolls.

    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to johnl@taugh.com on Wed Jan 24 11:33:47 2024
    On Wed, 24 Jan 2024 14:54:54 -0000 (UTC), John Levine
    <johnl@taugh.com> wrote:

    According to Thomas Koenig <tkoenig@netcologne.de>:
    Case in point: the NewEgg patent, which was just an application of Diffie- >>> Helmann key exchange. Whitfield Diffie himself took the stand to testify >>> that he had come up with the idea decades before. Yet the jury were
    unconvinced, and let the patent stand.

    Was this before or after the US followed the rest of the world by
    allowing opposition proceedings? Having such a case go straight
    to a jury is somewhat problematic...

    It was in 2013. Since 2012 the US has had inter partes review, where
    you can have the Patent Trial and Appeal Board review a patent to see
    if it's not novel. That case was filed under the old rules, and was in >Marshall TX, a rural corner of Texas with a judge notoriously friendly
    to patent trolls.

    If it went to trial in 2013, the case was brought LONG before that,
    and would have been governed by the rules in force when it started.
    Patents are litigated in federal courts where the wait for a trial
    typically is 3..4 years.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)