Forum: >>> Magnum BBS <<<

Assembly source code dialects

From Martin Doherty@21:1/5 to All on Thu Mar 18 16:28:07 2021

This is pretty vague but I've been mumbling on this idea in my head for a while. There is a core standard for 6502 assembler source code - the mnemonics are invariant (I hope), and the syntax for hex ($0A), addressing modes, register names etc. might
also be completely standard and invariant from one assembler to another.

However, the standard halts there and each assembler designer from that point elaborates on the language in their own way, leading to the existence of multiple dialects with varying degrees of similarity. Pseudo-ops (EQU, ORG, ASC), rules for forming
labels, comments etc. are all areas of syntactic departure. Dialects include Merlin, Apple Pascal, Big Mac, ORCA for 6502 ... basically each assembler defines one more dialect of the language ... even different versions of one assembler might result in
more "dialects", although the differences there are likely miniscule.

If I come across an ASM listing in a magazine, the code or the article may or may not identify the target assembler for this particular unit of source code. As well, I may have a different assembler but not the one targeted by the author. So I'd wish for
a tool that could
a) take a source file and detect the assembler(s) it is targeted to, or compatible with; and
b) automatically translate from dialect A to dialect B

I'm interested in this as a possible programming challenge. Currently I foresee much slogging through assembler manuals to document the syntax rules of each dialect. Does such a tool already exist in some form? Or, do you know of any techniques or
information sources that would to some degree address this functional requirement?

Disclaimer: I can barely write two lines of ASM to save my life, although I've been intending to learn it for the last 40 years. This exercise would go a long way to scratching that itch!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Schmidt@21:1/5 to Martin Doherty on Thu Mar 18 22:59:20 2021

On 3/18/21 7:28 PM, Martin Doherty wrote:

This is pretty vague but I've been mumbling on this idea in my head for a while. There is a core standard for 6502 assembler source code - the mnemonics are invariant (I hope), and the syntax for hex ($0A), addressing modes, register names etc. might

also be completely standard and invariant from one assembler to another.

However, the standard halts there and each assembler designer from that point elaborates on the language in their own way, leading to the existence of multiple dialects with varying degrees of similarity. Pseudo-ops (EQU, ORG, ASC), rules for forming

labels, comments etc. are all areas of syntactic departure. Dialects include Merlin, Apple Pascal, Big Mac, ORCA for 6502 ... basically each assembler defines one more dialect of the language ... even different versions of one assembler might result in
more "dialects", although the differences there are likely miniscule.

If I come across an ASM listing in a magazine, the code or the article may or may not identify the target assembler for this particular unit of source code. As well, I may have a different assembler but not the one targeted by the author. So I'd wish

for a tool that could

a) take a source file and detect the assembler(s) it is targeted to, or compatible with; and
b) automatically translate from dialect A to dialect B

I'm interested in this as a possible programming challenge. Currently I foresee much slogging through assembler manuals to document the syntax rules of each dialect. Does such a tool already exist in some form? Or, do you know of any techniques or

information sources that would to some degree address this functional requirement?

Disclaimer: I can barely write two lines of ASM to save my life, although I've been intending to learn it for the last 40 years. This exercise would go a long way to scratching that itch!

I bet there are a few such tools out there, but I generally just do that mapping in my head because ca65 is my favored assembler. :-)

I will say that the opcodes and addressing modes are definitely locked
down by the 6502 datasheet. Everything else is convenience on the
assembler's part: definition and use of various things. They might
include, off the top of my head:
- Labels (and referencing their high/low values if 16-bit)
- Macros (definition and references)
- Looping/convenience constructs (basically macros again)
- Data definition (including size/shape, initialization)
- String definition (might include high bit manipulation)
- Address pointer manipulation (i.e. "*=$0800")
- Linker directives (where to store various hunks)
And there are other niceties such as inferring when something (an
address or data element) is 8-bit or 16-bit - consider zero-page
references as an example.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From I am Rob@21:1/5 to martindo on Thu Mar 18 23:50:36 2021

On Thursday, March 18, 2021 at 5:28:08 PM UTC-6, martindo wrote:
<snip>

If I come across an ASM listing in a magazine, the code or the article may or may not identify the target assembler for this particular unit of source code. As well, I may have a different assembler but not the one targeted by the author. So I'd wish

for a tool that could

a) take a source file and detect the assembler(s) it is targeted to, or compatible with; and
b) automatically translate from dialect A to dialect B

I'm interested in this as a possible programming challenge. Currently I foresee much slogging through assembler manuals to document the syntax rules of each dialect. Does such a tool already exist in some form? Or, do you know of any techniques or

information sources that would to some degree address this functional requirement?

Disclaimer: I can barely write two lines of ASM to save my life, although I've been intending to learn it for the last 40 years. This exercise would go a long way to scratching that itch!

I would select an assembler first and stick with it. Once you get to know all its nuances, then converting from other assembled sources becomes quite easy. For the most part, most assemblers are similar with syntax. You might see .DB instead of DFB,
or .H instead of HEX. But you will quickly see how another assemblers' syntax relates to the one you prefer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael Pohoreski@21:1/5 to m...@gmail.com on Fri Mar 19 07:53:34 2021

On Thursday, March 18, 2021 at 4:28:08 PM UTC-7, m...@gmail.com wrote:

This is pretty vague but I've been mumbling on this idea in my head for a while. There is a core standard for 6502 assembler source code - the mnemonics are invariant (I hope), and the syntax for hex ($0A), addressing modes, register names etc. might

also be completely standard and invariant from one assembler to another.

However, the standard halts there and each assembler designer from that point elaborates on the language in their own way, leading to the existence of multiple dialects with varying degrees of similarity. Pseudo-ops (EQU, ORG, ASC), rules for forming

labels, comments etc. are all areas of syntactic departure. Dialects include Merlin, Apple Pascal, Big Mac, ORCA for 6502 ... basically each assembler defines one more dialect of the language ... even different versions of one assembler might result in
more "dialects", although the differences there are likely miniscule.

I briefly started doing some of this in AppleWin's debugger but got side-tracked.
https://github.com/AppleWin/AppleWin/blob/master/source/Debugger/Debugger_Assembler.cpp#L302

There are _many_ small differences. Take for example a simple instruction:

STA $0002

Should this be?

* Absolute? 8D 02 00 STA $0002
* Zero-page? 85 02 STA $02

Also,

What does the assembler default to? Does it generate the 2-byte opcode or the 3-byte opcode?
How do you force the assembler to generate the _other_ opcode?

Stings are another pain point. Due to the Woz's unfortunate non-conventional high ASCII usage there are times you want:

* low-bit ASCII,
* other times you want high-bit ASCII,
* times you want them mixed, and
* times you want to have raw hex (such as zero termination.)

For example one of the things that makes ca65 garbage out-of-the-box is that it wasn't actually designed by someone who _does_ Apple 2 6502 programming so you have to jump through all sorts of macro garbage to solve practical problems. Merlin32 makes
this trivial by using ' and " syntax.

48 65 6C 6C 6F ASC 'Hello' ; Using simple quote, the high bit is set to 0 (standard ASCII)
C8 E5 EC EC EF ASC "Hello" ; Using double quotes, the high bit is set to 1 (for Text Screen encoding)
41 8D 00 ASC 'A',8D,00

Partially shamelessly copied from help page https://brutaldeluxe.fr/products/crossdevtools/merlin/

Another common example is "pad to end of page". Every assembler has their own syntax for doing this. Again Merlin32 makes this trivial:

A0 A0 A0 ... DS \,$A0 ; Fill memory with 0xA0 values until the next memory page

If I come across an ASM listing in a magazine, the code or the article may or may not identify the target assembler for this particular unit of source code. As well, I may have a different assembler but not the one targeted by the author. So I'd wish

for a tool that could

a) take a source file and detect the assembler(s) it is targeted to, or compatible with; and
b) automatically translate from dialect A to dialect B

Given that there are so few assemblers you could just "brute force" it and see what assembles without errors.

I'm interested in this as a possible programming challenge. Currently I foresee much slogging through assembler manuals to document the syntax rules of each dialect. Does such a tool already exist in some form?

No. The closest I know of is 6502 bench.
https://github.com/fadden/6502bench

Or, do you know of any techniques or information sources that would to some degree address this functional requirement?

You may want to think about this process in reverse:

Given a series of opcodes, generate the 7 different ways it would be written in 7 different assemblers. At this point you are basically doing a string compare.

Why? Because the opcodes are the "ground truth".

Opcodes = representation
Assembly = presentation

What I would do to start is:

1. Create a GitHub project
2. Create a directory that has:
a) a HDV with all the (popular?) native Apple 2 assemblers (Apple Pascal, Big Mac, DOS Tool Kit, Merlin, ORCA, etc.) and
b) along with cross assemblers (Merlin32, sbasm3, ca65, etc.)
c) a batch file that compiles the 7+ different assembly source files
3) Generate a complete 8x256 spreadsheet / text file matrix for every valid opcode, where opcodes run vertically and the assemblers run horizontally. (Columns are each assembler input). I would *highly* recommend you have the assemblers in alphabetical
order:

Opcode ApplePascal Big Mac CA65 DOS Toolkit Merlin Merlin32 ORCA SBASM3 Notes
00
:
85 02
:
8D 02 01
:
FF

4. Next you'll want to catalog radix input (What is the character prefix for a binary literal?)
5. Next you'll want to catalog all the pseudo-directives as ORG, EQU
6. Next you'll want to catalog all the byte, word, address, storage layouts
7. Next you'll want to catalog string types
8. Next you'll want to catalog expressions
9. Next you'll want to catalog PC usage and utility functionality (pad to end of page, etc.)
10. Next you'll want to catalog local and global labels
11. Lastly you'll want to cover macros

I'm probably missing a few different types of syntax but that should be enough to get you started.

Disclaimer: I can barely write two lines of ASM to save my life, although I've been intending to learn it for the last 40 years. This exercise would go a long way to scratching that itch!

Never too late to learn! Mark Lemmert learning 6502 assembly language and shipping his Nox Archaist game proved that all that you need is determination, time, and capacity.

Keep us posted of your progress!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From qkumba@21:1/5 to All on Fri Mar 19 09:36:44 2021

step 1. pick any two assemblers, and lots and lots of examples for each.
step 2. build all of those samples with the appropriate assembler.
step 3. see if you can convert from A to B and match 100%.
step 4. see if you can convert from B to A and match 100%.

Once you achieve that, add another assembler, but now you have A->C, B->C, C->A, and C->B.
It becomes an explosive combinatorial problem, and you might want to stop there.
However, you don't need to understand assembly language in order to do the conversion, because you don't need to understand the programs in order to convert them, only to match the output.
It's a text-replacement problem more than anything else.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrew Roughan@21:1/5 to All on Sat Mar 20 13:54:07 2021

Does such a tool already exist in some form? Or, do you know of any techniques or information sources that would to some degree address this functional requirement?

I believe there is a tool provided with Merlin to convert from Orca/M code.

IIgs Resource editors have generation for different assembler formats.

Richard Bennett converted a lot of code to Merlin and some code to MPW
IIgs. He has tools that he used to do this work. See MPW to Merlin-16+ http://www.kashum.com/a2lib/

Regards
Andrew

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrew Roughan@21:1/5 to Andrew Roughan on Thu Mar 25 12:18:45 2021

Andrew Roughan <andrew.roughan@writeme.com> wrote:

Does such a tool already exist in some form? Or, do you know of any
techniques or information sources that would to some degree address this
functional requirement?

I believe there is a tool provided with Merlin to convert from Orca/M code.

IIgs Resource editors have generation for different assembler formats.

Richard Bennett converted a lot of code to Merlin and some code to MPW
IIgs. He has tools that he used to do this work. See MPW to Merlin-16+ http://www.kashum.com/a2lib/

Tim Meekins has just released source code for Merlin to Orca. https://github.com/tmeekins/Apple-IIGS-Projects/tree/main/Merlin2Orca

You’re welcome.

Regards
Andrew

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From eriknoc@gmail.com@21:1/5 to Andrew Roughan on Fri May 14 01:46:13 2021

On Thursday, March 25, 2021 at 7:18:47 AM UTC-5, Andrew Roughan wrote:

Andrew Roughan <andrew....@writeme.com> wrote:

Does such a tool already exist in some form? Or, do you know of any
techniques or information sources that would to some degree address this >> functional requirement?

I believe there is a tool provided with Merlin to convert from Orca/M code.

IIgs Resource editors have generation for different assembler formats.

Richard Bennett converted a lot of code to Merlin and some code to MPW IIgs. He has tools that he used to do this work. See MPW to Merlin-16+ http://www.kashum.com/a2lib/

Tim Meekins has just released source code for Merlin to Orca. https://github.com/tmeekins/Apple-IIGS-Projects/tree/main/Merlin2Orca

You’re welcome.

Regards
Andrew

I've been looking through magazines from time to time, making notes of anything interesting to me that I may want to come back to. This subject just happens to be one of them. Here you go, straight from my notes! :-)

https://archive.org/details/Apple-Orchard-v1n1-1980-Mar-Apr
p47-48; Converting Brand X To Work With Brand Y {assembler conversion guide}

https://archive.org/details/Apple-Orchard-v1n3-1980-1-Winter
p35,39; Some Notes About The UCSD Assembler {the Apple Pascal assembler} p43-52; Inside The Silentype [printer] Firmware {interface card; memory pokes; for Apple Pascal assembler}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	292
Nodes:	16 (2 / 14)
Uptime:	207:35:19
Calls:	6,618
Files:	12,168
Messages:	5,317,007

Assembly source code dialects

Who's Online

System Info