Forum: >>> Magnum BBS <<<

=?UTF-8?Q?Hi_all=2C_building_a_Verilog_FORTH=E2=80=A6_but=3F_Why=3F?=

From SpainHackForth@21:1/5 to All on Tue Dec 13 10:31:51 2022

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/
5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

Ok, while I started this several weeks ago, I have a simple question… what do HW Forth implementations really do? I mean, I understand a picoJava, as it’s executing “bitecode,” and you remove the interpreter as the bite-codes map 1:1 to the
Operating Instructions *Opcodes. The JIT in essence becomes the compiler and the code is run native.

Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied code?

I ask this, as I was trying to wrap my head on how to implement the OpCodes for the Forth CPU, and then I was daunted by the fact that you need to, some how compile the words with the outer interpreter to get to the codes words, is that all that is
really happening here?

What re the fundamental benefits vs just leveraging a lo cost *$0.50 µCU? Is it the word length? Specially since we are now seen cores running at 200MHz the size of the Attiny and they are 32 bits and also 400/600 Mhz RiscV systems the size of the
STM32F4 or smaller… >twice the RAM / FLASH…

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to SpainHackForth on Tue Dec 13 10:51:23 2022

SpainHackForth schrieb am Dienstag, 13. Dezember 2022 um 19:31:52 UTC+1:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn

With all due respect, you are trying to make too many and too big steps at once.
IOW in your shoes I would first learn to set up and run a Forth system on a small MCU
like Arduinos to get the feeling.

In parallel dive deeper into Verilog, f.ex. by studying this project: http://mindworks.shoutwiki.com/wiki/Forth_Computing_on_FPGA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Tue Dec 13 11:19:14 2022

With all due respect, you are trying to make too many and too big steps at once.
IOW in your shoes I would first learn to set up and run a Forth system on a small MCU
like Arduinos to get the feeling.

Thanks for the Link!

In parallel dive deeper into Verilog, f.ex. by studying this project: http://mindworks.shoutwiki.com/wiki/Forth_Computing_on_FPGA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Tue Dec 13 13:44:55 2022

On Tuesday, December 13, 2022 at 2:31:52 PM UTC-4, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

Ok, while I started this several weeks ago, I have a simple question… what do HW Forth implementations really do? I mean, I understand a picoJava, as it’s executing “bitecode,” and you remove the interpreter as the bite-codes map 1:1 to the

Operating Instructions *Opcodes. The JIT in essence becomes the compiler and the code is run native.

Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied code?

I ask this, as I was trying to wrap my head on how to implement the OpCodes for the Forth CPU, and then I was daunted by the fact that you need to, some how compile the words with the outer interpreter to get to the codes words, is that all that is

really happening here?

What re the fundamental benefits vs just leveraging a lo cost *$0.50 µCU? Is it the word length? Specially since we are now seen cores running at 200MHz the size of the Attiny and they are 32 bits and also 400/600 Mhz RiscV systems the size of the

STM32F4 or smaller… >twice the RAM / FLASH…

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

I think you don't understand Forth. Forth has explicit definitions of words that are part of any program you write. Every word is "compiled", or better to say, "defined" before it can be executed. At that point, the Forth "interpreter" can execute any
word that has been defined.

It's not clear to me how you might apply the concept of JIT, other than just writing normal Forth code that does compile the program when loading.

As to your Verilog, what are you trying to do with a line like,

temp[7:0] = 8'bzzzz_zzzz;

I'm curious as to what you think this assignment will do in hardware that results from this code.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Tue Dec 13 14:21:24 2022

As to your Verilog, what are you trying to do with a line like,

temp[7:0] = 8'bzzzz_zzzz;

I'm curious as to what you think this assignment will do in hardware that results from this code.

Not sure how to take your request, it’s just assigning a temp space for register temp… <Size>’<base><number>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Tue Dec 13 15:13:22 2022

I'm curious as to what you think this assignment will do in hardware that results from this code.

Not sure how to take your request, it’s just assigning a temp space for register temp… <Size>’<base><number>

I'm asking an honest question. Verilog is an HDL, Hardware Description Language. It is used to "describe" hardware in terms of it's actions. What exactly do you think that assignment does in terms of an action. "assigning a temp space" has no meaning

in hardware.

I'm trying to understand how you are visualizing Verilog. I'm guessing that you don't actually have much understanding of the nature of the hardware produced.

"Assigning a space" is something done in Verilog by declaring the signal, or whatever they call them in Verilog. I'm more conversant in VHDL which has signals and variables. Both can be thought of as "wires" with defined logic states. But most of the

states possible in the default types have some states that are not actually realizable in hardware. They are mostly used to show results of poor logic design.

Not try to be rude or anything. I'm just trying to figure out if I can help you in any way.

Again, not sure how to take your request, I’m very conscious of overly apologetic conversations when I don’t see a reason for it, it raises a high level of suspicion on my end, so not really sure why you are apologizing?

As clearly stated… I’m learning Verilog, so I don’t have a deep understanding of the language nor do I claim to be a subject matter expert.

Please feel free to “show” me what the line does?
I’m always open to an opportunity to learn, hence my original questions.

In the mean time, I can share with you my understanding… "zzzz_zzzz" is a placeholder value that indicates an unknown or undefined state. It is often used in Verilog code as a default value for registers or other variables when their actual value is
not known or not relevant. In the code you provided, "zzzz_zzzz" is assigned to the "temp" register in several cases where the value of the "temp" register is not used or is not important.

Here, let me explain what i’m trying to do and by al means, show me a sample VHDL of how accomplish the following:

I’m just building a simple program counter…. 3 bit mode, a clock, a pc_value and a temp value… If mode is 010 i assign the pc_value to the data bus, if the mode is 000 I value to 0’s set temp z’ s else if mode is 001 I set value to the bus and
temp so z’ s, and if 010 again does nothing * has been set in first case, else if mode changes 011 and temp is set to z’s and if mode is 100, value reg is incremented to 1…. block waits for raising edge to execute each instruction…

I’ve implemented this on a read board about 10 times playing with my kids… all it takes is a 555 and a 4027 (from memory).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Tue Dec 13 14:38:32 2022

On Tuesday, December 13, 2022 at 6:21:26 PM UTC-4, SpainHackForth wrote:

As to your Verilog, what are you trying to do with a line like,

temp[7:0] = 8'bzzzz_zzzz;

I'm curious as to what you think this assignment will do in hardware that results from this code.

Not sure how to take your request, it’s just assigning a temp space for register temp… <Size>’<base><number>

I'm asking an honest question. Verilog is an HDL, Hardware Description Language. It is used to "describe" hardware in terms of it's actions. What exactly do you think that assignment does in terms of an action. "assigning a temp space" has no meaning
in hardware.

I'm trying to understand how you are visualizing Verilog. I'm guessing that you don't actually have much understanding of the nature of the hardware produced.

"Assigning a space" is something done in Verilog by declaring the signal, or whatever they call them in Verilog. I'm more conversant in VHDL which has signals and variables. Both can be thought of as "wires" with defined logic states. But most of the
states possible in the default types have some states that are not actually realizable in hardware. They are mostly used to show results of poor logic design.

Not try to be rude or anything. I'm just trying to figure out if I can help you in any way.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Tue Dec 13 17:58:33 2022

On Tuesday, December 13, 2022 at 7:13:24 PM UTC-4, SpainHackForth wrote:

I'm curious as to what you think this assignment will do in hardware that results from this code.

Not sure how to take your request, it’s just assigning a temp space for register temp… <Size>’<base><number>

I'm asking an honest question. Verilog is an HDL, Hardware Description Language. It is used to "describe" hardware in terms of it's actions. What exactly do you think that assignment does in terms of an action. "assigning a temp space" has no meaning

in hardware.

I'm trying to understand how you are visualizing Verilog. I'm guessing that you don't actually have much understanding of the nature of the hardware produced.

"Assigning a space" is something done in Verilog by declaring the signal, or whatever they call them in Verilog. I'm more conversant in VHDL which has signals and variables. Both can be thought of as "wires" with defined logic states. But most of the

states possible in the default types have some states that are not actually realizable in hardware. They are mostly used to show results of poor logic design.

Not try to be rude or anything. I'm just trying to figure out if I can help you in any way.

Again, not sure how to take your request, I’m very conscious of overly apologetic conversations when I don’t see a reason for it, it raises a high level of suspicion on my end, so not really sure why you are apologizing?

There are some real whackos in this group, so I'm trying to not be condescending by trying to teach you something you don't need or want to learn. Some people here are set off very easily, and would find that highly offensive.

As clearly stated… I’m learning Verilog, so I don’t have a deep understanding of the language nor do I claim to be a subject matter expert.

Please feel free to “show” me what the line does?
I’m always open to an opportunity to learn, hence my original questions.

In terms of hardware synthesis, it does nothing useful, unless you want temp to be a tri-state bus. If implementing in a modern FPGA, there are no tri-state buses, so not a good idea. The only other use is to flag that the contents of that register
are invalid, as you say, but it makes no sense for the designer to set that value. In fact, I think the more appropriate value would be 'x', but, as I said, I'm much more conversant in VHDL, so I'm not sure of all the values available, or what is best.
Verilog may not have all the same choices as VHDL.

In the mean time, I can share with you my understanding… "zzzz_zzzz" is a placeholder value that indicates an unknown or undefined state. It is often used in Verilog code as a default value for registers or other variables when their actual value is

not known or not relevant. In the code you provided, "zzzz_zzzz" is assigned to the "temp" register in several cases where the value of the "temp" register is not used or is not important.

I guess my question would be, why is temp undefined or unknown at the times you are assigning z's?

The code you've written in the always block should specify an assignment in every part of the code, unless you want that signal to hold it's previous value. This would be implemented by using a clock enable.

Just to make sure we are on the same foot, assignments in a clocked always clause, define a register. If the register output is not defined in any flow through the always block, it will have a clock enable to be disabled in that condition.

Here, let me explain what i’m trying to do and by al means, show me a sample VHDL of how accomplish the following:

I’m just building a simple program counter…. 3 bit mode, a clock, a pc_value and a temp value… If mode is 010 i assign the pc_value to the data bus, if the mode is 000 I value to 0’s set temp z’ s else if mode is 001 I set value to the bus

and temp so z’ s, and if 010 again does nothing * has been set in first case, else if mode changes 011 and temp is set to z’s and if mode is 100, value reg is incremented to 1…. block waits for raising edge to execute each instruction…

I never see temp set to a value other than z's. It would appear to have no valid assignment, so no valid value, ever. In addition, temp is never used by any other logic. So even if it were assigned a value, it would be optimized away by the tools,
unless you turn off that feature (discarding useless logic, i.e. no outputs).

So we can ignore temp in understanding what this code does.

When pc_mode is zero, initialize pc_value to 0
When pc_mode is one, set pc_value from data_bus
When pc_mode is four, set pc_value to pc_value + 1

That's it. Seems like a reasonable set of operations for a program counter if you are limiting it to simple jumps, or calculating the address elsewhere for more complex jumps.

I’ve implemented this on a read board about 10 times playing with my kids… all it takes is a 555 and a 4027 (from memory).

I don't know what a "read board" is. I'm not sure how a 555 timer and a pair of FFs could implement this design. You would need four 4027 chips for the pc_value register. You would also need an adder for the increment operation, or some gates to
implement half adders. Oh, and mux chips to switch the load between data_bus and pc_value + 1.

Is any of this useful? Any questions? Or am I missing the point entirely? Is there some use for temp which has not been coded yet?

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jurgen Pitaske@21:1/5 to SpainHackForth on Wed Dec 14 01:03:49 2022

On Tuesday, 13 December 2022 at 18:31:52 UTC, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog…
i’m not proficient in neither, but I like to learn and hack stuff,
I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

I wonder if somebody has done the other option in the past: VHDL
So it could be compared with your approach.
Testra has done something here
http://www.testra.com/Forth/VHDL.htm

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Zbig@21:1/5 to All on Wed Dec 14 03:44:21 2022

Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied code?

I ask this, as I was trying to wrap my head on how to implement the OpCodes for the Forth CPU, and then I was daunted by the fact that you need to, some how compile the words with the outer interpreter to get to the codes words, is that all that is

really happening here?

You may want to study Brad Rodriguez' paper „Moving Forth” (5 parts):
http://www.bradrodriguez.com/papers/moving1.htm

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jan Coombs@21:1/5 to SpainHackForth on Wed Dec 14 14:34:16 2022

On Tue, 13 Dec 2022 10:31:51 -0800 (PST)
SpainHackForth <jemo07@gmail.com> wrote:
[…]

Ok, while I started this several weeks ago, I have a simple question… what do HW Forth implementations really do? I mean, I understand a picoJava, as it’s executing “bitecode,” and you remove the interpreter as the bite-codes
map 1:1 to the Operating Instructions *Opcodes. The JIT in essence becomes the compiler and the code is run native.

[…]
The python virtual stack engine also uses byte size tokens for it's instructions. Many of these provide much higher level functions than
can be easily implemented in a simple hardware stack machine. Bernd's
b16[1] has 32 instructions (plus a few), and this is sufficient to host
a forth system. Each 16b instruction fetch contains either a 15b address, three 5b instructions, or a mixture.

Memory addresses in the b16 can be derived from it's IP, the A register,
the TOR register, a combinations of register and inline data, or from the incremented previous memory address.

As with many processors, performance is limited by the memory access time.
It is therefore good to latch the next address on a clock edge, and avoid
any asynchronous logic between the latches and the memory.

The state machine controlling the processor and other signals must therefore predict the address source for the next memory cycle, and select it in the cycle before. In linear code the Memory address will mostly be incremented. Where a source other than the IP is to be used, the incremented address
value is saved back into the IP reg. Similarly, during a call this is pushed to the R stack.

So, the IP reg is used to save the IP when it is not being used, which makes it's logic fairly simple. Selecting what will be needed as the next address source is likely where the complexity will be.

The b16 Verilog source code is powerfully minimalist, perhaps it is time for this neat processor to get out and become more appreciated.

Jan Coombs
--

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to Zbig on Wed Dec 14 06:45:22 2022

On Wednesday, December 14, 2022 at 12:44:23 PM UTC+1, Zbig wrote:

Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied

code?

I ask this, as I was trying to wrap my head on how to implement the OpCodes for the Forth CPU, and then I was daunted by the fact that you need to, some how compile the words with the outer interpreter to get to the codes words, is that all that is

really happening here?

You may want to study Brad Rodriguez' paper „Moving Forth” (5 parts): http://www.bradrodriguez.com/papers/moving1.htm

Thanks for the source, yes I agree that Brad’s book is great! it’s a fantastic way to learn how to write a forth.

Now, one thing is implementing Forth, the other is how a Forth VM implemented in HW.
So, stepping back and restating my question, if you consider most VM’s, NGA, JVM, BEAM *erlang, Python, WASM VM, ETC, they all have have a common “if I may overly simplify” architecture that they emulate a virtual CPU, that is, the machine code *
byte-code is further translated to host machine code.

In other words,
Code —> JIT *compiler VCPU Target —> Bytecode —> interpreter *compiler —> Hosted Machine code.
Java — >( SW / JVM ) —> Hosted Machine Code.
In HW:
Java —> ( JIT * compiler VCPU Target) —> [ picoJava Native CPU *HW ]

Now, in in Forth, it’s a bit different.

Forth Code —> JIT Interpreter * New Word —> Compiler — > Hosted Machine Code. ( the whole Forth systems is build on top of the hosted code… it’s hard to determine when the Inner and Outer compiler is acting on any time of the code. Words are
really little programs that pass messages through the stack. If you think about it…what could be mor elegant than that really…

What I see here and in Chuck’s one words, you don’t necessarily build a CPU, you are building a computer in HW, that is the realization I came to.

So, is that correct? otherwise, you are essentially building stack CPU, inspired by the Forth VM, but the output would be the same, you build a Forth with the newly created opcode *yes they would match 1:1 to the 28 or so core words, but other than that,
what are we gaining?

I think it’s a great experience, and it has given me the opportunity to understand many low level computational factors, and I will end up building my own little Forth VM in HW, but I’m asking a pure practical matter, what do you really gain by doing
this that you can’t already do on any other HW.

I’m not sure, I might have missed something and while I’m fine to enjoy the experience, I was hoping others would chime in with their own ideas as I have seem to have run into a wall of…. really?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Wed Dec 14 06:47:03 2022

So, the IP reg is used to save the IP when it is not being used, which makes it's logic fairly simple. Selecting what will be needed as the next address source is likely where the complexity will be.

The b16 Verilog source code is powerfully minimalist, perhaps it is time for this neat processor to get out and become more appreciated.

Jan Coombs
--

Jan, that is fantastic feedback.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Wed Dec 14 07:22:30 2022

On Wednesday, December 14, 2022 at 10:47:04 AM UTC-4, SpainHackForth wrote:

So, the IP reg is used to save the IP when it is not being used, which makes
it's logic fairly simple. Selecting what will be needed as the next address source is likely where the complexity will be.

The b16 Verilog source code is powerfully minimalist, perhaps it is time for
this neat processor to get out and become more appreciated.

Jan Coombs
--

Jan, that is fantastic feedback.

The Forth VM is unique to Forth. I don't recall any processor design that was literally a Forth VM, including the b16. They are stack machines, but often deviate from the Forth VM by adding various registers, and other details.

What exactly are you trying to do? What is your actual goal?

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Wed Dec 14 09:32:13 2022

Jan, that is fantastic feedback.

The Forth VM is unique to Forth. I don't recall any processor design that was literally a Forth VM, including the b16. They are stack machines, but often deviate from the Forth VM by adding various registers, and other details.

What exactly are you trying to do? What is your actual goal?

--

Rick C.

Hello Ric, yes,, I have to agree, the Forth VM, ( not sure if *VM is an appropriate term ) behaves more like a intermediate state machine, there are some principals of a CPU there, but the level of abstraction that the stack machine provides is quite
simple. If you think about it, it is a perfect target for a massive parallelization, provided you can keep track of the order of the machine to keep track of next… *there is the idea of my temp ;^D but I have to admit, I did read Bob’s Functional
Designs for digital computer long ago! :D

I don’t have goals, specifically, I’m just exploring the possibilities… I’m trying to learn a cool tech and fill the bunch of repurposed FPGA I got for $50

I wish I would have explore Forth rather than Java and we could have had FPGA 30 years ago and I was the the university… but why miss out on the fun of innovation…

Cheers,

Jose

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Brian Fox@21:1/5 to SpainHackForth on Wed Dec 14 12:44:55 2022

On Wednesday, December 14, 2022 at 12:32:15 PM UTC-5, SpainHackForth wrote:

I wish I would have explore Forth rather than Java and we could have had FPGA 30 years ago and I was the the university… but why miss out on the fun of innovation…

Cheers,

Jose

I am not sure anybody has answered your question directly.
Here is one from a hobby Forth guy.

"Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied code?"

So a traditional indirect-threaded Forth system compiles pointers to addresses and
although there is typically no "JIT" most implementers make some form of optimizer
in the course of their work. (peephole is popular) GForth, is unique, I think, in that
it tries to create optimized "super-instructions" to replace slower combinations of
Forth primitives.

In Forth on Hardware you will be compiling "native" opcodes for your CPU as primitives. Sub-routine calls will be used for hi-level words where you want to save space.
This most closely resembles "sub-routine threaded" Forth systems on conventional
machines. An optimizer could simply copy a sub-routine inline for a quick an easy speed up.

" ...you build a Forth with the newly created opcode *yes they would match 1:1 to
the 28 or so core words, but other than that, what are we gaining?"

Some Forth CPUs have taken advantage of the fact that <32 instructions can
be the entire set for a Forth CPU, encoded in 5 bits. So in a 16 bit word you can
place 3 instructions in one word and execute them in parallel if possible, or take
at least take advantage of this to reduce memory fetches to read the program. Some of Chuck Moore's machines used this approach as I recall.
Chuck reserved the last bit to make a sub-routine call and an implicit return. Clever.

I wish you success with your project and hope you publish your results.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Wed Dec 14 15:36:26 2022

I wish you success with your project and hope you publish your results.

Thank you Brian! I appreciate the encouragement, and most of all, you provided some good feedback as to possible benefits.

I just read some reference on Gary Bergstrom’s AFT, but I can’t seem to find a source to the reference documentation.
it reads of another set to stack registers and a more compact code… any one able to point me towards this reference.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to Myron Plichota on Wed Dec 14 20:50:25 2022

On Wednesday, December 14, 2022 at 11:36:18 PM UTC-5, Myron Plichota wrote:

On Wednesday, December 14, 2022 at 3:44:57 PM UTC-5, Brian Fox wrote:

In Forth on Hardware you will be compiling "native" opcodes for your CPU as primitives. Sub-routine calls will be used for hi-level words where you want to save space.
This most closely resembles "sub-routine threaded" Forth systems on conventional
machines. An optimizer could simply copy a sub-routine inline for a quick an
easy speed up.

" ...you build a Forth with the newly created opcode *yes they would match 1:1 to
the 28 or so core words, but other than that, what are we gaining?"
Some Forth CPUs have taken advantage of the fact that <32 instructions can be the entire set for a Forth CPU, encoded in 5 bits. So in a 16 bit word you can
place 3 instructions in one word and execute them in parallel if possible, or take
at least take advantage of this to reduce memory fetches to read the program.
Some of Chuck Moore's machines used this approach as I recall.

[

Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

]
I disagree on this point. Chuck's chips always had an explicit return aka ; instruction.
Only the called procedure can know when it is time to return.

I wish you success with your project and hope you publish your results.

I as well.

PS great analysis, Brian. Sorry this missed my first reply.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to Brian Fox on Wed Dec 14 20:36:17 2022

On Wednesday, December 14, 2022 at 3:44:57 PM UTC-5, Brian Fox wrote:

In Forth on Hardware you will be compiling "native" opcodes for your CPU as primitives. Sub-routine calls will be used for hi-level words where you want to save space.
This most closely resembles "sub-routine threaded" Forth systems on conventional
machines. An optimizer could simply copy a sub-routine inline for a quick an easy speed up.

" ...you build a Forth with the newly created opcode *yes they would match 1:1 to
the 28 or so core words, but other than that, what are we gaining?"
Some Forth CPUs have taken advantage of the fact that <32 instructions can
be the entire set for a Forth CPU, encoded in 5 bits. So in a 16 bit word you can
place 3 instructions in one word and execute them in parallel if possible, or take
at least take advantage of this to reduce memory fetches to read the program. Some of Chuck Moore's machines used this approach as I recall.

[

Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

]
I disagree on this point. Chuck's chips always had an explicit return aka ; instruction.
Only the called procedure can know when it is time to return.

I wish you success with your project and hope you publish your results.

I as well.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to SpainHackForth on Wed Dec 14 20:22:15 2022

On Wednesday, December 14, 2022 at 12:32:15 PM UTC-5, SpainHackForth wrote:

I don’t have goals, specifically, I’m just exploring the possibilities… I’m trying to learn a cool tech and fill the bunch of repurposed FPGA I got for $50

I wish I would have explore Forth rather than Java and we could have had FPGA 30 years ago and I was the the university… but why miss out on the fun of innovation…

It's kept me off the streets for 20+ years :)
I'm curious, what is your "repurposed" FPGA target?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Thu Dec 15 00:09:46 2022

On Wednesday, December 14, 2022 at 1:32:15 PM UTC-4, SpainHackForth wrote:

Jan, that is fantastic feedback.

The Forth VM is unique to Forth. I don't recall any processor design that was literally a Forth VM, including the b16. They are stack machines, but often deviate from the Forth VM by adding various registers, and other details.

What exactly are you trying to do? What is your actual goal?

--

Rick C.

Hello Ric, yes,, I have to agree, the Forth VM, ( not sure if *VM is an appropriate term ) behaves more like a intermediate state machine, there are some principals of a CPU there, but the level of abstraction that the stack machine provides is quite

simple. If you think about it, it is a perfect target for a massive parallelization, provided you can keep track of the order of the machine to keep track of next… *there is the idea of my temp ;^D but I have to admit, I did read Bob’s Functional
Designs for digital computer long ago! :D

I don’t have goals, specifically, I’m just exploring the possibilities… I’m trying to learn a cool tech and fill the bunch of repurposed FPGA I got for $50

I wish I would have explore Forth rather than Java and we could have had FPGA 30 years ago and I was the the university… but why miss out on the fun of innovation…

Ok, if you don't have any goals in mind, I'm not sure how I can help. If you have questions, ask, otherwise I'll tune out.

--

Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to Brian Fox on Thu Dec 15 00:34:30 2022

On Wednesday, December 14, 2022 at 4:44:57 PM UTC-4, Brian Fox wrote:

On Wednesday, December 14, 2022 at 12:32:15 PM UTC-5, SpainHackForth wrote:

I wish I would have explore Forth rather than Java and we could have had FPGA 30 years ago and I was the the university… but why miss out on the fun of innovation…

Cheers,

Jose

I am not sure anybody has answered your question directly.
Here is one from a hobby Forth guy.
"Now, for Forth, how does that work out? If the outer interpreter is a JIT, it then compiles the words into core words *who are in term just Opcodes. In essence, is all we are saying that when you write native Forth, you are really writing complied

code?"

So a traditional indirect-threaded Forth system compiles pointers to addresses and
although there is typically no "JIT" most implementers make some form of optimizer
in the course of their work. (peephole is popular) GForth, is unique, I think, in that
it tries to create optimized "super-instructions" to replace slower combinations of
Forth primitives.

In Forth on Hardware you will be compiling "native" opcodes for your CPU as primitives. Sub-routine calls will be used for hi-level words where you want to save space.
This most closely resembles "sub-routine threaded" Forth systems on conventional
machines. An optimizer could simply copy a sub-routine inline for a quick an easy speed up.

" ...you build a Forth with the newly created opcode *yes they would match 1:1 to
the 28 or so core words, but other than that, what are we gaining?"
Some Forth CPUs have taken advantage of the fact that <32 instructions can be the entire set for a Forth CPU, encoded in 5 bits. So in a 16 bit word you can
place 3 instructions in one word and execute them in parallel if possible, or take
at least take advantage of this to reduce memory fetches to read the program.
Some of Chuck Moore's machines used this approach as I recall.
Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

I wish you success with your project and hope you publish your results.

One of the things about working in FPGAs is that you have memory that will keep up with most logic designs. So the idea of internal program store with wide words containing multiple instructions, is not of much value. Packing multiple instructions into
words is only useful for external memory, that can not keep up with the CPU speed.

There are many options for instruction encoding. I designed an ISA that used a variable width instruction, allowing the remainder of the word to be immediate data. This was designed around the frequency of instruction use, so as to minimize the size of
the actual op code and maximize the size of immediate data. In fact, the design was agnostic as to the size of the data words. I had two versions, 8 bit instructions and 9 bit instructions. Literal is the first instruction with a one bit op code, '0'
in the msb, leaving the remaining n-1 bits as immediate data loaded onto the return stack with sign extension. Multiple literal instructions would shift left the previous top of return stack and shift in another n-1 bits. This could be repeated ad
nauseam to fill any size data word desired.

The return stack is used, since most immediate data are addresses. For a data immediate, the data then is transferred to the data stack ( R> ) and you have LITERAL.

The jump and call instructions are next with 2 or 3 (or maybe 4, not sure, I'd have to check) op codes with the remainder of the instruction as immediate data, again, loaded to the return stack in the same way as LITERAL. So you have some five or six
bits of signed address offset, great for loops and short jumps in one instruction.

The last design I was working on was a stack design that allowed offset addressing as part of the instruction. This greatly reduced the need for stack juggling instructions, like DUP, SWAP, ROT, etc. My test case was an interrupt handler for a software
NCO, with phase and frequency control on a sample by sample basis. The code size was reduced by a third, if I recall correctly. I put it aside and have not gotten back to it.

--

Rick C.

++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Thu Dec 15 00:28:39 2022

I wish you success with your project and hope you publish your results.

I as well.

PS great analysis, Brian. Sorry this missed my first reply.

I bought the Pano Logic boards, a lot of them for $50, I got I think 30 the first lot, and 10 cisco cdma modems the second lot, so I’m about $150 in with all these boards including shipping to Spain… :D

Here is a good resource I found after the fact, I bough these boards in 2018, so after Covid prices have changed, but you do get the Cisco board quite cheap if you search for them, if you get a lot like I did of used ones, they are very cheap indeed.

https://geeklan.co.uk/files/ossg16072020-repurposing_obsolete_fpga_and_dev_kits.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to SpainHackForth on Thu Dec 15 01:10:18 2022

On Thursday, December 15, 2022 at 3:28:41 AM UTC-5, SpainHackForth wrote:

I wish you success with your project and hope you publish your results.

I as well.

PS great analysis, Brian. Sorry this missed my first reply.

I bought the Pano Logic boards, a lot of them for $50, I got I think 30 the first lot, and 10 cisco cdma modems the second lot, so I’m about $150 in with all these boards including shipping to Spain… :D

Here is a good resource I found after the fact, I bough these boards in 2018, so after Covid prices have changed, but you do get the Cisco board quite cheap if you search for them, if you get a lot like I did of used ones, they are very cheap indeed.

https://geeklan.co.uk/files/ossg16072020-repurposing_obsolete_fpga_and_dev_kits.pdf

Thanks for the link. But I'm still unclear on whether your boards are based on Spartan-3E (G1?) or Spartan-6 (G2?).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Thu Dec 15 04:50:05 2022

Thanks for the link. But I'm still unclear on whether your boards are based on Spartan-3E (G1?) or Spartan-6 (G2?).

I got the G2 so they are Spartan 6.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to gnuarm.del...@gmail.com on Thu Dec 15 06:54:13 2022

On Thursday, December 15, 2022 at 9:34:32 AM UTC+1, gnuarm.del...@gmail.com wrote:
[..]

There are many options for instruction encoding. I designed an ISA that used a variable
width instruction, allowing the remainder of the word to be immediate data.

[..]

Multiple literal instructions would shift left the previous top of return stack and shift in
another n-1 bits. This could be repeated ad nauseam to fill any size data word desired.

[..]

The last design I was working on was a stack design that allowed offset addressing
as part of the instruction. This greatly reduced the need for stack juggling instructions,
like DUP, SWAP, ROT, etc.

So you reinvented the INMOS transputer?

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to SpainHackForth on Thu Dec 15 07:24:51 2022

On Thursday, December 15, 2022 at 7:50:07 AM UTC-5, SpainHackForth wrote:

Thanks for the link. But I'm still unclear on whether your boards are based on Spartan-3E (G1?) or Spartan-6 (G2?).

I got the G2 so they are Spartan 6.

Thanks Jose. You are in for a great adventure, I hope.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to Marcel Hendrix on Thu Dec 15 12:58:42 2022

On Thursday, December 15, 2022 at 9:54:15 AM UTC-5, Marcel Hendrix wrote:

On Thursday, December 15, 2022 at 9:34:32 AM UTC+1, gnuarm.del...@gmail.com wrote:
[..]

There are many options for instruction encoding. I designed an ISA that used a variable
width instruction, allowing the remainder of the word to be immediate data.

[..]

Multiple literal instructions would shift left the previous top of return stack and shift in
another n-1 bits. This could be repeated ad nauseam to fill any size data word desired.

[..]

The last design I was working on was a stack design that allowed offset addressing
as part of the instruction. This greatly reduced the need for stack juggling instructions,
like DUP, SWAP, ROT, etc.

So you reinvented the INMOS transputer?

That's what you think the Transputer is? One thing they did I never liked, was the opcodes were four bits. So you got a four bit opcode to provide a 4 bit immediate data nibble. Not so efficient. In Forth, much of the operations involve immediate
data, mostly addresses. I don't actually remember so much else about Transputer architecture. Did they only have a stack, or did they use registers? The TI 9900 processor family had registers in memory, so you could change the workspace pointer by the
size of the register set, or just by one. There were some very tricky games you could play.

I did some programming with Transputers in the 80s, I think. They were used on a significant system for the government. The project made things very complex though. Data was all time stamped rather than setting up known delay paths so the data would
be correlated by design, rather than having to look at time stamps and realigning. I guess they didn't trust their ability to maintain constant delay paths.

--

Rick C.

--- Get 1,000 miles of free Supercharging
--- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to gnuarm.del...@gmail.com on Thu Dec 15 14:33:23 2022

gnuarm.del...@gmail.com schrieb am Donnerstag, 15. Dezember 2022 um 21:58:44 UTC+1:

On Thursday, December 15, 2022 at 9:54:15 AM UTC-5, Marcel Hendrix wrote:

On Thursday, December 15, 2022 at 9:34:32 AM UTC+1, gnuarm.del...@gmail.com wrote:
[..]

There are many options for instruction encoding. I designed an ISA that used a variable
width instruction, allowing the remainder of the word to be immediate data.

[..]

Multiple literal instructions would shift left the previous top of return stack and shift in
another n-1 bits. This could be repeated ad nauseam to fill any size data word desired.

[..]

The last design I was working on was a stack design that allowed offset addressing
as part of the instruction. This greatly reduced the need for stack juggling instructions,
like DUP, SWAP, ROT, etc.

So you reinvented the INMOS transputer?

That's what you think the Transputer is? One thing they did I never liked, was the opcodes were four bits. So you got a four bit opcode to provide a 4 bit immediate data nibble. Not so efficient. In Forth, much of the operations involve immediate data,

mostly addresses. I don't actually remember so much else about Transputer architecture. Did they only have a stack, or did they use registers? The TI 9900 processor family had registers in memory, so you could change the workspace pointer by the size of
the register set, or just by one. There were some very tricky games you could play.

I did some programming with Transputers in the 80s, I think. They were used on a significant system for the government. The project made things very complex though. Data was all time stamped rather than setting up known delay paths so the data would be

correlated by design, rather than having to look at time stamps and realigning. I guess they didn't trust their ability to maintain constant delay paths.

Transputer networks were the idea of that time to implement parallel image or target tracking systems,
or Kalman based estimators. To get meaningful correlations in the time domain, time synchronity was
a must and the transputer programs had to be fast fast fast. Sometimes unconventional tricky ideas
had to be borrowed from old analog computers. As you obviously know, most things were and probably
are still classified, even after patent protection has run out and technology has moved on.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to Brian Fox on Thu Dec 15 18:06:48 2022

On Thursday, December 15, 2022 at 8:26:33 PM UTC-5, Brian Fox wrote:

On Wednesday, December 14, 2022 at 11:50:27 PM UTC-5, Myron Plichota wrote:

[

Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

]
I disagree on this point. Chuck's chips always had an explicit return aka ; instruction.
Only the called procedure can know when it is time to return.

I didn't have it quite right Myron, but I knew there was something in RTX2000
regarding free sub-routine returns.
It was only 30 something years ago. :-)

From Koopman, Stack Computers
---------------------------
Figure 4.9(c) -- RTX instruction formats -- ALU operation.

Figure 4.9c shows the format of the ALU instruction. Bits 0-3 control the operation
of the shifter that shifts the output of the ALU.

Bit 5 of the ALU instruction indicates a subroutine return operation. This allows
subroutine returns to be combined with preceding arithmetic operations to obtain
"free" subroutine returns in many cases.

Yeah, given Forth's inherent usage for finely divided code with lots of subroutines, it is worthwhile to exploit the parallelism possible. I did something similar with three parallel processes, an engine (stack, ALU, and memory) for data, a second
engine (stack and ALU) for addresses, i.e. return stack, and an instruction engine for calculating the instruction address. fetching and decoding the instruction. Making these execute in parallel is simply a matter of laying out an appropriate
instruction. The main issue is that by grabbing bits in the instruction for special purposes, it reduces the size of the remainder for other, general operations.

Most CPUs have the potential for parallelism, but the instruction format does not allow this, other than by specific instruction design, such as, decrement and jump on zero. But we mostly don't notice the parallelism because we've seen these
instructions all our lives.

I worked off of Koopman's Forth word frequencies to lay out the instruction sets. Seemed like that would be a useful thing to do rather than shoot from the hip. But he has two tables. One for how often a word was executed and one for how often the
word appeared in the code. So optimize code speed or optimize code density. The two aren't so different in Koopman's rankings, so either way I think you would get close to optimizing both.

--

Rick C.

--+ Get 1,000 miles of free Supercharging
--+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Brian Fox@21:1/5 to Myron Plichota on Thu Dec 15 17:26:31 2022

On Wednesday, December 14, 2022 at 11:50:27 PM UTC-5, Myron Plichota wrote:

[

Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

]
I disagree on this point. Chuck's chips always had an explicit return aka ; instruction.
Only the called procedure can know when it is time to return.

I didn't have it quite right Myron, but I knew there was something in RTX2000 regarding free sub-routine returns.
It was only 30 something years ago. :-)

From Koopman, Stack Computers
---------------------------
Figure 4.9(c) -- RTX instruction formats -- ALU operation.

Figure 4.9c shows the format of the ALU instruction. Bits 0-3 control the operation
of the shifter that shifts the output of the ALU.

Bit 5 of the ALU instruction indicates a subroutine return operation. This allows
subroutine returns to be combined with preceding arithmetic operations to obtain
"free" subroutine returns in many cases.
--------------------------

This would be an excellent reference for Jose as well. https://users.ece.cmu.edu/~koopman/stack_computers/index.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to gnuarm.deletethisbit@gmail.com on Fri Dec 16 09:44:28 2022

In article <fba10216-4153-4dd6-9dc0-6d1323fcd991n@googlegroups.com>,
Lorem Ipsum <gnuarm.deletethisbit@gmail.com> wrote:

On Thursday, December 15, 2022 at 9:54:15 AM UTC-5, Marcel Hendrix wrote:

On Thursday, December 15, 2022 at 9:34:32 AM UTC+1,

gnuarm.del...@gmail.com wrote:

[..]

There are many options for instruction encoding. I designed an ISA

that used a variable

width instruction, allowing the remainder of the word to be immediate data.

[..]

Multiple literal instructions would shift left the previous top of

return stack and shift in

another n-1 bits. This could be repeated ad nauseam to fill any size

data word desired.

[..]

The last design I was working on was a stack design that allowed

offset addressing

as part of the instruction. This greatly reduced the need for stack >juggling instructions,
like DUP, SWAP, ROT, etc.

So you reinvented the INMOS transputer?

That's what you think the Transputer is? One thing they did I never

You realize that Marcel Hendrix is one of the top ten in depth
transputer experts of the planet?
(Before the transputer boards arrived he had programmed an emulator
to test the transputer Forth)

liked, was the opcodes were four bits. So you got a four bit opcode to >provide a 4 bit immediate data nibble. Not so efficient. In Forth,

You remember incorrectly. This was crazy efficient. With the implicit
three deep stack you could program the hail stone sequence
'2 / 3 +' using implicit register (tos) addressing.
4 bits opcode, 4 bits immediate , two times makes 16 bit.
(on a 32 bit processor).

--

Rick C.

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to none albert on Fri Dec 16 01:44:11 2022

On Friday, December 16, 2022 at 3:44:31 AM UTC-5, none albert wrote:

In article <fba10216-4153-4dd6...@googlegroups.com>,
Lorem Ipsum <gnuarm.del...@gmail.com> wrote:

On Thursday, December 15, 2022 at 9:54:15 AM UTC-5, Marcel Hendrix wrote: >> On Thursday, December 15, 2022 at 9:34:32 AM UTC+1, >gnuarm.del...@gmail.com wrote:

[..]

There are many options for instruction encoding. I designed an ISA >that used a variable
width instruction, allowing the remainder of the word to be immediate data.

[..]

Multiple literal instructions would shift left the previous top of >return stack and shift in
another n-1 bits. This could be repeated ad nauseam to fill any size >data word desired.

[..]

The last design I was working on was a stack design that allowed >offset addressing
as part of the instruction. This greatly reduced the need for stack >juggling instructions,
like DUP, SWAP, ROT, etc.

So you reinvented the INMOS transputer?

That's what you think the Transputer is? One thing they did I never

You realize that Marcel Hendrix is one of the top ten in depth
transputer experts of the planet?
(Before the transputer boards arrived he had programmed an emulator
to test the transputer Forth)

That doesn't answer the question.

liked, was the opcodes were four bits. So you got a four bit opcode to >provide a 4 bit immediate data nibble. Not so efficient. In Forth,

You remember incorrectly. This was crazy efficient. With the implicit
three deep stack you could program the hail stone sequence
'2 / 3 +' using implicit register (tos) addressing.
4 bits opcode, 4 bits immediate , two times makes 16 bit.
(on a 32 bit processor).

Ah, yes. It's coming back to me.

I'm not sure how using 4 bits to load 4 bits of immediate data is "crazy efficient" compared to using 1 bit to load n-1 bits.

https://www.transputer.net/iset/pdf/tis-sum.pdf

The Function Codes table 12.9 (page 63) gives the basic function code set. Where the operand value is less than 16, a single byte encodes the complete instruction. If the operand value is greater than 15, one prefix instruction (pfix) is required for
each additional four bits of the operand. If the operand is negative the first prefix instruction will be
nfix.

Not sure what you are talking about. I guess you think being able to encode a very limited domain constant operands in a minimum number of instruction bits is important. This two byte count only works for very small values of constants, 0 to 15 to be
exact. Actually, the calculations you show are not two bytes. 2 / 3 + requires loading the 2, dividing, then loading the two with an addition.

42 ldc 2
22FC div
83 adc

So a total of four bytes. I had forgotten that even the instructions use the immediate data method to extend the instruction set beyond 4 bits. So an 8 bit instruction requires two bytes and a 12 bit instruction requires three bytes.

The limit in the constant in the ldc and adc instructions is 0 to 15 without using another 8 bit instruction which would allow a range up to -256 to 255. The scheme I describe, if working with 8 bit opcodes, allows a constant of ±64 in one byte and ±
4096 in two bytes.

I chose non-immediate operand instructions to use more bits (still no more than 8/9) because they are used less often according to Koopman. If I implement the div instruction in my instruction set, the above sequence would also be four bytes (either 8
or 9 bits depending on the implementation). However, as I've said, the above sequence is infrequent in Forth according to Koopman, so not something I would try to optimize.

What am I missing?

--

Rick C.

--+ Get 1,000 miles of free Supercharging
--+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to Brian Fox on Fri Dec 16 04:53:16 2022

On Thursday, December 15, 2022 at 8:26:33 PM UTC-5, Brian Fox wrote:

On Wednesday, December 14, 2022 at 11:50:27 PM UTC-5, Myron Plichota wrote:

[

Chuck reserved the last bit to make a sub-routine call and an implicit return.
Clever.

]
I disagree on this point. Chuck's chips always had an explicit return aka ; instruction.
Only the called procedure can know when it is time to return.

I didn't have it quite right Myron, but I knew there was something in RTX2000 regarding free sub-routine returns.
It was only 30 something years ago. :-)

From Koopman, Stack Computers
---------------------------
Figure 4.9(c) -- RTX instruction formats -- ALU operation.

Figure 4.9c shows the format of the ALU instruction. Bits 0-3 control the operation
of the shifter that shifts the output of the ALU.

Bit 5 of the ALU instruction indicates a subroutine return operation. This allows
subroutine returns to be combined with preceding arithmetic operations to obtain
"free" subroutine returns in many cases.
--------------------------

This would be an excellent reference for Jose as well. https://users.ece.cmu.edu/~koopman/stack_computers/index.html

Now I understand the RTX2000 context. Thanks. https://users.ece.cmu.edu/~koopman/stack_computers/sec4_5.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Fri Dec 16 10:50:59 2022

https://users.ece.cmu.edu/~koopman/stack_computers/index.html

Now I understand the RTX2000 context. Thanks. https://users.ece.cmu.edu/~koopman/stack_computers/sec4_5.html

Hum interesting.. :D thanks for sharing..

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Koch@21:1/5 to All on Fri Dec 16 23:18:41 2022

https://github.com/badgeteam/mch2022-firmware-ice40/tree/master/projects/Forth https://github.com/badgeteam/mch2022-firmware-ice40/blob/master/projects/Forth/rtl/common-verilog/j1-universal-16kb-quickstore.v

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to SpainHackForth on Sat Dec 17 00:55:21 2022

On Tuesday, December 13, 2022 at 1:31:52 PM UTC-5, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

[snip]

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

Blessed are the humble in spirit. After pondering ProgramCounter.v, my 2 cents:

1) inout [7:0] data_bus; (and z values)
The inout module port attribute must be reserved for the top-level SoC FPGA pins.
On-chip FPGA routing fabric does not provide bidirectional wires that can be driven by multiple sources, and the jellybean concept of connecting tristate microprocessor, RAM, ROM, and IO through copper no longer works.
Instead, definite inputs, outputs, and muxs are used to work under the hood, knowing only unidirectional 1s and 0s.

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

3) A global reset is normally present on any SoC, and you may wish to take advantage of that.

4) The archaic Verilog dialect has a modern form analogous to the K&R to ANSI C improvement.

I have found Icarus Verilog to be a priceless tool for proving modules. The ability to apply human-readable test benches to generate human-readable simulation logs beats the crap out of vendor-locked-in GUI simulators. There is a learning curve, but I
would have been defeated by my own errors a long time ago without Icarus Verilog. Once a module has been tested to its boundary conditions, it may become "trusted", and higher level design can proceed with confidence. This design discipline has never
failed me to produce a working FPGA, and the door has always remained open to do it better the next version, even on different silicon!

I'm not going to wish you luck, because luck will have no part in your future success. You are obviously willing to drill down to the basics of a program counter, and your starting point (flawed or not) ought to remind Forth "gurus" about what it's all
about.

Cheers - Myron Plichota

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to Myron Plichota on Sat Dec 17 08:34:40 2022

On Saturday, December 17, 2022 at 3:55:23 AM UTC-5, Myron Plichota wrote:

On Tuesday, December 13, 2022 at 1:31:52 PM UTC-5, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

[snip]

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

Blessed are the humble in spirit. After pondering ProgramCounter.v, my 2 cents:

1) inout [7:0] data_bus; (and z values)
The inout module port attribute must be reserved for the top-level SoC FPGA pins.
On-chip FPGA routing fabric does not provide bidirectional wires that can be driven by multiple sources, and the jellybean concept of connecting tristate microprocessor, RAM, ROM, and IO through copper no longer works.
Instead, definite inputs, outputs, and muxs are used to work under the hood, knowing only unidirectional 1s and 0s.

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

This is not true. Both assignments have a purpose, you simply need to understand how they work.

In VHDL the assignments are := and <=. := is used with variables to update a variable immediately in a process (or procedure), which is much more like the way software works. It is used to describe logic within the process.

<= establishes a value to be assigned, but does not perform the assignment until the process exits. This is typically used for register assignments.

Because of the timing of the assignments, they will produce different results. If A starts with the value 4.

A := 5;
B := A;

and

A <= 5;
B <= A;

inside a process, will assign 5 and 4 to B respectively, and create different logic. Which you use, depends on what you wish to describe.

3) A global reset is normally present on any SoC, and you may wish to take advantage of that.

Global resets must be used with caution. The difficulty is exiting the reset condition. The reset signal is asynchronous to the clock, plus has long routing times. This makes it hard for every part of the design to exit reset at the same time,
resulting in the device starting in random states. There are techniques to mitigate this problem.

--

Rick C.

-+- Get 1,000 miles of free Supercharging
-+- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Myron Plichota@21:1/5 to gnuarm.del...@gmail.com on Sun Dec 18 05:06:17 2022

On Saturday, December 17, 2022 at 11:34:42 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Saturday, December 17, 2022 at 3:55:23 AM UTC-5, Myron Plichota wrote:

On Tuesday, December 13, 2022 at 1:31:52 PM UTC-5, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

[snip]

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

Blessed are the humble in spirit. After pondering ProgramCounter.v, my 2 cents:

1) inout [7:0] data_bus; (and z values)
The inout module port attribute must be reserved for the top-level SoC FPGA pins.
On-chip FPGA routing fabric does not provide bidirectional wires that can be driven by multiple sources, and the jellybean concept of connecting tristate microprocessor, RAM, ROM, and IO through copper no longer works.
Instead, definite inputs, outputs, and muxs are used to work under the hood, knowing only unidirectional 1s and 0s.

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

This is not true. Both assignments have a purpose, you simply need to understand how they work.

In VHDL the assignments are := and <=. := is used with variables to update a variable immediately in a process (or procedure), which is much more like the way software works. It is used to describe logic within the process.

<= establishes a value to be assigned, but does not perform the assignment until the process exits. This is typically used for register assignments.

Jose's First Verilog Project (a program counter driven by a clock rising edge, i.e. a "smart" register) has no need for advanced topics. I remember being a Verilog newbie, and the KISS principle.

Whipping out VHDL (not Verilog) syntax contributes nothing to a Verilog newbie's success.

3) A global reset is normally present on any SoC, and you may wish to take advantage of that.

Global resets must be used with caution. The difficulty is exiting the reset condition. The reset signal is asynchronous to the clock, plus has long routing times. This makes it hard for every part of the design to exit reset at the same time,

resulting in the device starting in random states. There are techniques to mitigate this problem.

Indeed there are. e.g:
1) do not use async resets, even if the FPGA fabric offers the option, eliminating said problem
2) instead, write modules to test reset as the first decision within an always @(posedge clock) clause
3) prevent metastability by resampling the external reset pin through 2 simple D registers

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jurgen Pitaske@21:1/5 to Myron Plichota on Sun Dec 18 07:25:54 2022

On Sunday, 18 December 2022 at 13:06:18 UTC, Myron Plichota wrote:

On Saturday, December 17, 2022 at 11:34:42 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Saturday, December 17, 2022 at 3:55:23 AM UTC-5, Myron Plichota wrote:

On Tuesday, December 13, 2022 at 1:31:52 PM UTC-5, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

[snip]

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

Blessed are the humble in spirit. After pondering ProgramCounter.v, my 2 cents:

1) inout [7:0] data_bus; (and z values)
The inout module port attribute must be reserved for the top-level SoC FPGA pins.
On-chip FPGA routing fabric does not provide bidirectional wires that can be driven by multiple sources, and the jellybean concept of connecting tristate microprocessor, RAM, ROM, and IO through copper no longer works.
Instead, definite inputs, outputs, and muxs are used to work under the hood, knowing only unidirectional 1s and 0s.

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

This is not true. Both assignments have a purpose, you simply need to understand how they work.

In VHDL the assignments are := and <=. := is used with variables to update a variable immediately in a process (or procedure), which is much more like the way software works. It is used to describe logic within the process.

<= establishes a value to be assigned, but does not perform the assignment until the process exits. This is typically used for register assignments.

Jose's First Verilog Project (a program counter driven by a clock rising edge, i.e. a "smart" register) has no need for advanced topics. I remember being a Verilog newbie, and the KISS principle.

Whipping out VHDL (not Verilog) syntax contributes nothing to a Verilog newbie's success.

3) A global reset is normally present on any SoC, and you may wish to take advantage of that.

Global resets must be used with caution. The difficulty is exiting the reset condition. The reset signal is asynchronous to the clock, plus has long routing times. This makes it hard for every part of the design to exit reset at the same time,

resulting in the device starting in random states. There are techniques to mitigate this problem.

Indeed there are. e.g:
1) do not use async resets, even if the FPGA fabric offers the option, eliminating said problem
2) instead, write modules to test reset as the first decision within an always @(posedge clock) clause
3) prevent metastability by resampling the external reset pin through 2 simple D registers

If people are interested in a start with Verilog / VHDL,
this link might be a good starting point with many examples
including the NANDLAND board to run the code.
I had suggested to him already to adapt his examples to the standard Lattice boards,
where people could add this special hardware themselves.
this would increase the amount of users a lot probably https://nandland.com/learn-verilog/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From SpainHackForth@21:1/5 to All on Sun Dec 18 10:45:24 2022

starting in random states. There are techniques to mitigate this problem.

Indeed there are. e.g:
1) do not use async resets, even if the FPGA fabric offers the option, eliminating said problem
2) instead, write modules to test reset as the first decision within an always @(posedge clock) clause
3) prevent metastability by resampling the external reset pin through 2 simple D registers

If people are interested in a start with Verilog / VHDL,
this link might be a good starting point with many examples
including the NANDLAND board to run the code.
I had suggested to him already to adapt his examples to the standard Lattice boards,
where people could add this special hardware themselves.
this would increase the amount of users a lot probably https://nandland.com/learn-verilog/

Thank you all for the comments and encouragement, I made a slight modification: https://gist.githubusercontent.com/jemo07/4f039197b1174c99962732bf3e5dc93b/raw/2f98c30501c4d108e4a2a8a6c38bf57011b5107c/ProgramCounter-sysver.v

Here is a little progress… https://gist.githubusercontent.com/jemo07/f7dd6ee69bda88f430453cedc5fe6b74/raw/635a09db45aff8d7cb6c33bd91a6152ea7284fcb/ControlUnit_sysver.v

I’m not sure if this will all work, but I’m sure learning a ton about the HW and mostly about Forth!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to Myron Plichota on Sun Dec 18 10:18:55 2022

On Sunday, December 18, 2022 at 8:06:18 AM UTC-5, Myron Plichota wrote:

On Saturday, December 17, 2022 at 11:34:42 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Saturday, December 17, 2022 at 3:55:23 AM UTC-5, Myron Plichota wrote:

On Tuesday, December 13, 2022 at 1:31:52 PM UTC-5, SpainHackForth wrote:

Ok, so I’m learning Forth and Verilog… i’m not proficient in neither, but I like to learn and hack stuff, I built my first module on Verilog as a staring point for a Forth FPGA… https://gist.githubusercontent.com/jemo07/

5d5ba7d31bb12410888f46ca6060a1f2/raw/1c597704a9a8f4f99b3e73a3906905ef0f38c358/ProgramCounter.v

This is my cheesy Program counter, I know, don’t laugh… :D.

[snip]

I’m asking for my understanding, I’m enjoying learning something like Verilog, maybe even System Verilog… but I just don’t get the logic.

Blessed are the humble in spirit. After pondering ProgramCounter.v, my 2 cents:

1) inout [7:0] data_bus; (and z values)
The inout module port attribute must be reserved for the top-level SoC FPGA pins.
On-chip FPGA routing fabric does not provide bidirectional wires that can be driven by multiple sources, and the jellybean concept of connecting tristate microprocessor, RAM, ROM, and IO through copper no longer works.
Instead, definite inputs, outputs, and muxs are used to work under the hood, knowing only unidirectional 1s and 0s.

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

This is not true. Both assignments have a purpose, you simply need to understand how they work.

In VHDL the assignments are := and <=. := is used with variables to update a variable immediately in a process (or procedure), which is much more like the way software works. It is used to describe logic within the process.

<= establishes a value to be assigned, but does not perform the assignment until the process exits. This is typically used for register assignments.

Jose's First Verilog Project (a program counter driven by a clock rising edge, i.e. a "smart" register) has no need for advanced topics. I remember being a Verilog newbie, and the KISS principle.

Whipping out VHDL (not Verilog) syntax contributes nothing to a Verilog newbie's success.

The VHDL syntax is analogous to Verilog. I used VHDL because that is what I know. The point is someone provided bogus information, which is much worse. Oh, wait, it was YOU! If you wish to amend your statement to make it correct, then I'm happy to
let the matter set. Giving a newbie wrong information, especially without any explanation, will do more harm than simply keeping quiet.

This statement,

2) Within an always @(posedge clock) clause, the "<=" assignment (not "=") must be used for meaningful results.

is wrong, flat out wrong.

3) A global reset is normally present on any SoC, and you may wish to take advantage of that.

Global resets must be used with caution. The difficulty is exiting the reset condition. The reset signal is asynchronous to the clock, plus has long routing times. This makes it hard for every part of the design to exit reset at the same time,

resulting in the device starting in random states. There are techniques to mitigate this problem.

Indeed there are. e.g:
1) do not use async resets, even if the FPGA fabric offers the option, eliminating said problem

This is not an issue of sync vs. async resets. This is a matter of meeting timing requirements.

2) instead, write modules to test reset as the first decision within an always @(posedge clock) clause

Still not getting the issue. How do your designs ever start up properly? Luck?

3) prevent metastability by resampling the external reset pin through 2 simple D registers

Metastability can be important, but not really an issue with the reset, because of the infrequent use of reset. Metastability is never "fixed". It is simply brought within a defined bound. The reset issue is already within the defined bound because
the reset signal is so infrequent. Metastability becomes a problem with the product of the clock frequency and the data change frequency are very high. A 200 MHz clock and data changing near clock edge can have significant metastability issues. A
serial port at 9600 bps will not present metastability issues within your lifetime. The reset will not present metastability issues within the next 100 years.

The problem is coordination of the exit from reset. If you have a state machine (aka a counter, etc), the reset could be seen by some bits on this clock edge, but other bits may not see the reset until the next clock edge. This is simply a lack of
synchronization, not metastability (which is much, much less frequent). The few bits that saw the reset released first will advance according to the logic driving them, but the other bits remain held in the reset state until then next clock. Then the
bits of the state machine are out of step and can produce an invalid result.

It seems there is no general way that people mitigate this problem. I prefer to assure each state machine synchronizes the startup in a way that prevents the problem. For example, the states can include an intermediate reset state where only one bit is
changed from the primary reset state. Then it does not matter if only part of the logic sees the reset removed. On the next clock edge, when the reset has had adequate time to reach all parts of the register, it will still be in one of two states,
reset, or reset'.

The higher level design then has to assure that the various parts can start up without synchronization problems, using simple handshakes, if needed.

--

Rick C.

-++ Get 1,000 miles of free Supercharging
-++ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lorem Ipsum@21:1/5 to SpainHackForth on Sun Dec 18 11:25:06 2022

On Sunday, December 18, 2022 at 1:45:26 PM UTC-5, SpainHackForth wrote:

starting in random states. There are techniques to mitigate this problem.

Indeed there are. e.g:
1) do not use async resets, even if the FPGA fabric offers the option, eliminating said problem
2) instead, write modules to test reset as the first decision within an always @(posedge clock) clause
3) prevent metastability by resampling the external reset pin through 2 simple D registers

If people are interested in a start with Verilog / VHDL,
this link might be a good starting point with many examples
including the NANDLAND board to run the code.
I had suggested to him already to adapt his examples to the standard Lattice boards,
where people could add this special hardware themselves.
this would increase the amount of users a lot probably https://nandland.com/learn-verilog/

Thank you all for the comments and encouragement, I made a slight modification: https://gist.githubusercontent.com/jemo07/4f039197b1174c99962732bf3e5dc93b/raw/2f98c30501c4d108e4a2a8a6c38bf57011b5107c/ProgramCounter-sysver.v

Here is a little progress… https://gist.githubusercontent.com/jemo07/f7dd6ee69bda88f430453cedc5fe6b74/raw/635a09db45aff8d7cb6c33bd91a6152ea7284fcb/ControlUnit_sysver.v

I’m not sure if this will all work, but I’m sure learning a ton about the HW and mostly about Forth!

These statements still have no purpose.

// do nothing (value z at data bus)
data_bus[7:0] = 8'bzzzz_zzzz;

I still don't understand what you think this accomplishes.

Also, you declare data_bus as an inout. This has no meaning inside modules in normal use. For this module, data_bus should be an input. It should NEVER be assigned a value, because it is being assigned somewhere else. If you assign it a value here,
that creates a driver. Assign it a value in another module and that also creates a driver. This is two drivers driving the same signal, which creates an error condition in hardware. It will create invalid conditions in your simulation.

Remove all the assignments to data_bus and you have a module that does something useful. The one place where you seem to be making a valid assignment to data_bus,

data_bus[7:0] = pc_value[7:0];

should be in the module that creates the drivers for data_bus, not here. This is not a matter of style, this is a matter of creating a design that means something. As written, this design will simply create errors.

--

Rick C.

+-- Get 1,000 miles of free Supercharging
+-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	35:44:04
Calls:	6,707
Files:	12,239
Messages:	5,353,388

=?UTF-8?Q?Hi_all=2C_building_a_Verilog_FORTH=E2=80=A6_but=3F_Why=3F?=

Who's Online

System Info