• GA Misc chips. Rate you can address and execute off an external Srams?

    From Wayne morellini@21:1/5 to All on Wed Aug 3 07:42:33 2022
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Wayne morellini on Wed Aug 3 09:47:12 2022
    On Wednesday, August 3, 2022 at 10:42:34 AM UTC-4, Wayne morellini wrote:
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.

    What type of memory are you referring to?

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Wayne morellini on Wed Aug 3 10:03:14 2022
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    ..use an Sram chip
    ..

    Static ram.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Wayne morellini on Wed Aug 3 13:48:32 2022
    On Wednesday, August 3, 2022 at 1:03:15 PM UTC-4, Wayne morellini wrote:
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    ..use an Sram chip
    ..

    Static ram.

    Static rams are getting harder to come by, but they all (the engineering qualified "all") have one thing in common, they use very simple timing to control operation, using no clocks. In a generic SRAM, the reads are purely asynchronous, only requiring
    the WE to remain deasserted and an address to be set up. Then X ns after the last signal has changed, the output data will be stable and can be read.

    A write is a bit more work. You have to set up the data and address, wait some minimum time which may be zero (and may be different for the address and data), not sure, and assert the WE. Wait some minimum time (a similar time to the read access time
    usually) and deassert the WE.

    Sometimes there is a read enable that simply tristates the data out, or other times it also is involved in the write cycle timing. Keep it deasserted when not reading and it should not be an issue.

    Bottom line is, you need to check the data sheets.

    One big problem is the temperature dependence of the F18A timing on temperature, voltage and process. So you have to leave wide margins in your timing calculations.

    I'll let you map this to GA144 chip timings. They never specified the memory interface in a way that allows it to be used for a memory interface. It was actually intended to be used with SDRAM which is much lower power and lower cost. It's hard to
    even find much SRAM these days and what's out there is very expensive. Check Digikey.

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Rick C on Wed Aug 3 16:34:50 2022
    On Wednesday, August 3, 2022 at 4:48:33 PM UTC-4, Rick C wrote:
    On Wednesday, August 3, 2022 at 1:03:15 PM UTC-4, Wayne morellini wrote:
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    ..use an Sram chip
    ..

    Static ram.
    Static rams are getting harder to come by, but they all (the engineering qualified "all") have one thing in common, they use very simple timing to control operation, using no clocks. In a generic SRAM, the reads are purely asynchronous, only requiring
    the WE to remain deasserted and an address to be set up. Then X ns after the last signal has changed, the output data will be stable and can be read.

    A write is a bit more work. You have to set up the data and address, wait some minimum time which may be zero (and may be different for the address and data), not sure, and assert the WE. Wait some minimum time (a similar time to the read access time
    usually) and deassert the WE.

    Sometimes there is a read enable that simply tristates the data out, or other times it also is involved in the write cycle timing. Keep it deasserted when not reading and it should not be an issue.

    Bottom line is, you need to check the data sheets.

    One big problem is the temperature dependence of the F18A timing on temperature, voltage and process. So you have to leave wide margins in your timing calculations.

    I'll let you map this to GA144 chip timings. They never specified the memory interface in a way that allows it to be used for a memory interface. It was actually intended to be used with SDRAM which is much lower power and lower cost. It's hard to even
    find much SRAM these days and what's out there is very expensive. Check Digikey.

    I almost forgot. The fact that the memory interface was intended for SDRAM, means the address bus is only 18 bits wide I believe. So unless you cobble more nodes into the equation, you will be limited to a quarter MW of memory. So it gets even more
    hairy.

    Does Green Arrays not have an app note on this?

    --

    Rick C.

    -- Get 1,000 miles of free Supercharging
    -- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Thu Aug 4 06:13:07 2022
    On Thursday, August 4, 2022 at 6:48:33 AM UTC+10, gnuarm.del...@gmail.com wrote:
    On Wednesday, August 3, 2022 at 1:03:15 PM UTC-4, Wayne morellini wrote:
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    ..use an Sram chip
    ..

    Static ram.
    Static rams are getting harder to come by, but they all (the engineering qualified "all") have one thing in common, they use very simple timing to control operation, using no clocks. In a generic SRAM, the reads are purely asynchronous, only requiring
    the WE to remain deasserted and an address to be set up. Then X ns after the last signal has changed, the output data will be stable and can be read.

    A write is a bit more work. You have to set up the data and address, wait some minimum time which may be zero (and may be different for the address and data), not sure, and assert the WE. Wait some minimum time (a similar time to the read access time
    usually) and deassert the WE.

    Sometimes there is a read enable that simply tristates the data out, or other times it also is involved in the write cycle timing. Keep it deasserted when not reading and it should not be an issue.

    Bottom line is, you need to check the data sheets.

    One big problem is the temperature dependence of the F18A timing on temperature, voltage and process. So you have to leave wide margins in your timing calculations.

    I'll let you map this to GA144 chip timings. They never specified the memory interface in a way that allows it to be used for a memory interface. It was actually intended to be used with SDRAM which is much lower power and lower cost. It's hard to even
    find much SRAM these days and what's out there is very expensive. Check Digikey.

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209

    Well, thanks for this. I presumed it was sram, as I stopped following it years back.

    I was was trying to ask, in the real world what was the maximum, as I figured
    somebody might have such experience.

    Once I have such measurements, I can work out the ceiling and estimate how much
    less it will be for various chips and what I want. Which ball parks the possibility, or
    impossibility, of different scenarios, and see if it's worth it (to go further)

    I choose sram because it should be the simplest to drive to higher speeds. Admittedly,
    a prefetch next word would simplify performance a little. But, these are only peak
    values, and have to be trimmed to use many PSram memories (pseudo sram which is
    dram with its own internal timing and refresh which presented by an external Srams
    interface. Sram is only an ideal starting point, I don't know if there is any Psram that
    can keep up, but it's out there and common. I'll just start looking for something which
    Can saturate the soft bus and work downwards. 700mhz would be ideal, but I'm reality,
    If it's not on chip, I doubt I'm going fund that I'm the market place cheap, or at all.

    The real situation is that the GA memories only allow basic functionality for that project,
    but the alternatives also have compromises, like using miniature MCU boards bigger
    than a GA system with external memory. So a GA with high speed memory
    makes sense and offers a lot of functionality. The 18 bit restriction is fine, and I would
    look at bank switching anyway. A full speed execute bus and more memory per node,
    would be great, but enough sired of the soft bus would do a lot.

    I was sizing up an ATiny85 to fit it in an RCA plug today, but need a way to make a little
    board, or bendable board, to fit in there. I'm wondering about the possibility of using
    connectors instead of a board.

    I must say, that the industry is missing out by not making a single or dual, or more,
    channel serial bus memory format, as the mobile serial is hitting over 20 Gb/s. This
    allows a serial io interface with as many of these memories on the chip circumference
    as processors there.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to All on Sun Aug 7 07:57:06 2022
    There is supposed to be a B version of the GA architecture coming, but I hope for
    something that has hardware versions of the execution bus (and serial to parralel bus
    memory version with timing at highest speed possible) and DMA io/distributor, with
    new internode communications.

    If they could have an option to package large Psram memory and storage dies in the same
    package, for the developer chips, that would be great. The chip is so low heat, they
    could vertically stack them. We will see what's next.

    I'm going come out with it, I was looking at the possibility to mount such chips on-top or
    below the GA144 package, to reduce overall system size, and have the circuit board
    mounted likewise, to keep the package close to 1cm by 1cm. I just had another idea,
    but deleted it, and another one suitable for the microcontroller board
    market. But, it might be possible affix the chips and circuits in a millimetre or so
    on the chip package face. What is the best low cost way to do it?

    I'm actually looking for the smallest prototyping board possible for this package.

    I still can't understand why interpreted execution through a software bus has to be slow
    (I imagined it had a lot to do with maximum up data rate. Anybody know?). You are
    basically sending a 18 bit address value and retrieving a 18 bit data value and branching
    to a routine on each shift to get instruction on its own, which might be duplicate, shift to
    align lookup table structure to branch to operations, then duplicate and repeat for next
    instruction until you finish the 18 bit sequence. So, that is slow.

    But, if only you could load 18 bits in like serial instructions and execute, but then it has
    no context for any branch in the scheme. You would have to capture those branches
    and redirect to the right place, slowing things down, if at all possible (I don't imagine
    serial executing supports branching (haven't looked at it for years). So, your routine
    then has to be load vm version of instruction word, execute instructions through local
    branches, and if a branch, then interpret that branch according to large address space.

    So, a thought last night, is to code everything as particles in large memory, execute by
    small address space in the particle, then address larger address space like you would
    in bank switching. Hard to program, but highest throughput. Now, can you use
    a second node to handle addressing to speed things up (well, you have to, as you, you
    don't have a second 18 lines on the first node, to my understanding)? Or would it
    be better to use the second node in parralel to run a second parralel memory
    access? Which is dual chip or dual port memory. So, here we might get 325+MHz, and a
    lot of mucking around. I don't know this chip to be able to tell if such port execution is
    possible at speed, and if the particle up particle transfer can be fine at speed.

    If only the 6 bit port on the chip was mapped to the external address bus pins,
    then it all automatically resolves addressing within the particle, and a software
    write to address pins produces a particle bank swap. Now, you just have to
    store the current particle address to start to produce a call return value mechanism.
    So, that maybe several memory cycles to do calls to word routine in any particle in
    the large memory. Programming to stay within a particle as much as possible
    reduces the average time used by that mechanism. But, If you treat each 64 word
    particle as a call without the ability to externally address internal word routines, you
    reduce this amount even further.

    Please, external Psram interface (or whatever guess full speed cheap) and serial to
    parrallel version (I forget which embedded standard has this, but a few serial ports
    that support this would be great to distribute data stores around the chip
    circumference). Still 128 words of internal sram would also help (even 512)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Sun Aug 7 07:52:00 2022
    On Thursday, August 4, 2022 at 6:48:33 AM UTC+10, gnuarm.del...@gmail.com wrote:
    On Wednesday, August 3, 2022 at 1:03:15 PM UTC-4, Wayne morellini wrote:
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    ..use an Sram chip
    ..

    Static ram.
    Static rams are getting harder to come by, but they all (the engineering qualified "all") have one thing in common, they use very simple timing to control operation, using no clocks. In a generic SRAM, the reads are purely asynchronous, only requiring
    the WE to remain deasserted and an address to be set up. Then X ns after the last signal has changed, the output data will be stable and can be read.

    A write is a bit more work. You have to set up the data and address, wait some minimum time which may be zero (and may be different for the address and data), not sure, and assert the WE. Wait some minimum time (a similar time to the read access time
    usually) and deassert the WE.

    Sometimes there is a read enable that simply tristates the data out, or other times it also is involved in the write cycle timing. Keep it deasserted when not reading and it should not be an issue.

    Bottom line is, you need to check the data sheets.

    One big problem is the temperature dependence of the F18A timing on temperature, voltage and process. So you have to leave wide margins in your timing calculations.

    I'll let you map this to GA144 chip timings. They never specified the memory interface in a way that allows it to be used for a memory interface. It was actually intended to be used with SDRAM which is much lower power and lower cost. It's hard to even
    find much SRAM these days and what's out there is very expensive. Check Digikey.

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209


    So, I saw in your post ages ago, that you could only get up to 50mhz accessing
    external memory on the GA144. I had suspected it might be like that, until I read
    into the sdram app note. With Psram availability, it's a mystery why a native bus
    wasn't implemented, as it's basically like the internal bus. These misc designs originally
    cane about to use cache sram. I'm very grateful the market made the Pseudo
    Sram concept, as you. An stick all sorts of commodity ram chips into the package
    and with a cheap sliver of a memory controller circuit, translate that to an sram
    interface. But, thinking about it, the same thing goes for any memory interface. My two
    wire parralel to serial auto execute interface could just be a supper simple controller
    to commodity mobile memories (though these have major issues with value
    shift)l. But there is bound to be a runner up happy to use up their manufacturing
    capacity in the Psram market, which includes a lot of microcontrollers.

    So, the external memory for the 144, is not going be useful for me, and there is not
    enough internal memory. I had started to think of some quick techniques to do read
    and write more quickly, even if I could use execute a port to somehow execute a words
    from a serial memory some how in a controllable order, or redesign an interpreter to get
    closer to half 650mhz (say 213hz or less). Looks like they took my advice many years
    back and put a software DMA in, where I had wanted a very simple digital circuit instead
    etc, allowing ports to communicate it through. No real figures are given. I was hunting
    down a very small prototyping board for it. Looking at ways powering a core of the
    GA144 off of the RCA video socket.

    But, it would be too complex for my users. Some will get it, some others would think
    those people were legends and most wouldn't. Colorforth would be enough of an
    abstraction.

    A colorforth like virtual machine could run on something like an ATiny85
    (whatever chips with enough memory, which can ingress video, manipulate enough,
    and output a video stream, in a single chip package, likely connected together without
    mainboard), and maintain code compatibility with versions based on other
    chips, and the same VM abstracted instruction set shared with the other project,
    and between products. The other projects instruction set architecture is going be
    different from the VM but still compatible. You then can just use this before you order a
    custom chip. I actually had figured out a way to do object orientated processing system
    which could enable day a primitive system like an Atari 2600 to run complex 2D
    games similar to Sonic the Hedgehog. So, the retro processor project is going be
    interesting (the community processor is just a simple embedded with good io, the
    retro was actually going be a very sophisticated processing, which I hadn't
    remembered before, but had identified that dropping the game features it looked good
    for embedded. These are just some of several different architecture proposals over
    the decades. The retro one was an old 8 bit design proposal I had, that I am
    applying new ideas to in 8 or 16 bit form. As you can see in the other thread, I have
    located a minute processor FPGA card due for release, this could be tested on. The
    other design proposals likely don't need instruction set validation, as instruction for
    instruction, they are equivalent and can map too, an existing validated
    instruction set. Your main issue, is if you want to change little things to improve coding
    and workflow, like testing jumps, where you have to find out if that is a good idea. So,
    the community processor, is Just another Misc processor in another format (JaMISC)
    and will compile code unaltered if not self changing on the same address and io
    space, of course).

    You also will likely be able to find JavaScript API's for these chips, to optionally use, and
    role the VM into that, to make development more comfortable (before native
    API's are ready), but people don't need it, it would have a very small abstracted native
    substrate to interface too.

    So, still significant research to go, to calculate which cheap part has enough processor to
    properly handle the work load. I'm also identifying the project for a commercial
    version with high volume sales, as it substantially overlaps with a few other
    planned projects. So, it's still not a waste of time as a low volume diy. A shame, there is
    a lot not said, but the GA144 version would have been pretty interesting.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Wayne morellini on Sun Aug 7 07:47:10 2022
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.


    I tried to post this s few nights back, but lost the post.

    I'm feeling significantly better, after bad post (I used lots of vitamin C and maybe pine
    needle tea, accidental bad reactions to two treatments, and found I had bought the
    wrong version of something else I really needed, which was more harmful than good.
    Then some flu, and heaps of unexpected problems and people causing me problems.
    And on cream for one cancer, and getting another cut out. It now, is time to
    rebuild and start to catch up doing somethings again. But I have discovered all the
    deterioration has made things marginal, such as even if I get really well to do stuff,
    slight sickness will make me less functional. Which is an issue doing complex technical
    stuff. So that's where it stands with these projects.


    I'll split the post.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Wayne morellini on Sat Sep 3 23:28:07 2022
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.

    Syncing video filter project threads.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Wayne morellini on Sat Sep 3 23:27:14 2022
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.

    Syncing video filter project threads.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Wayne morellini on Sun Sep 4 08:41:42 2022
    On Sunday, September 4, 2022 at 4:28:09 PM UTC+10, Wayne morellini wrote:
    On Thursday, August 4, 2022 at 12:42:34 AM UTC+10, Wayne morellini wrote:
    On the GA chips, there are no full external memory buss, and a multi node scheme has to be setup for a node to use an Sram chip to execute code off of it. I presumed this would be a lot of cycles per fetch, but then realised you could use nodes to
    synchronise sequential and random addressing and fetch.

    How many cycles does it take to send an address say for the next word) fetch it, and start executing it?

    Thanks.
    Syncing video filter project threads.


    Video filter project

    https://groups.google.com/g/comp.lang.forth/c/S-_qe2Kh6gk/m/6DH2cIxEBwAJ

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)