• NumberCruncher Reloaded - math accelerator card

    From D Finnigan@21:1/5 to All on Sat Aug 7 15:13:38 2021
    Check it out:

    https://www.geekdot.com/numbercruncher-reloaded/

    The NumberCruncher Reloaded is a peripheral card for the Apple II series
    that features a math co-processor, often also called a Floating Point Unit (FPU) which is specialized on, well, floating point calculations. Doing so,
    it is much, much faster than any 6502 or 65816 CPU ever will be.

    That said, the NumberCruncher Reloaded will not automatically speed-up your programs as CPU accelerators like the Transwarp GS or ZIP CHIP would do. Programs will have to be either specifically written to use the
    NumberCruncher Reloaded or use a floating-point library like the SANE
    interface which then needs to be patched to itself use the NumberCruncher Reloaded for calculations instead of the main CPU.



    The “Reloaded” in its name hints towards the fact that this is a reboot of an already existing card. To make writing/reading easier, NumberCruncher Reloaded might be shortened to ‘NC-R’ further down this page…
    In 1988, there was the Floating Point Engine (FPE) created by Innovative Systems (‘iS’ for short).
    Read more about those in my separate post over here.

    While it was a great idea, it wasn’t the most stable design – but it laid the foundation especially and most importantly for the software we’re still using today.
    Due to the FPE’s issues there was quite some displeasure in the user-ship
    and in 1990 a German company called Alternative Systems announced the Number Cruncher, an ‘overhaul’ of the original design – here’s their newsgroup announcement:

    “The FPE is suffering from a major problem, namely the coproc is crashing internally and has to be reset in software. This happens in a
    non-deterministic way, and software written for that engineering junk must
    be adapted to that.
    The Number Cruncher is compatible with the FPE but is actually what the FPE
    was supposed to be – a math coproc that works. It performs very well.”

    Over the years the FPE as well as the NC faded in unobtanium. Because they
    were cool, and I love processors of all kinds it was time to revive the
    Number Cruncher.

    Revival!

    If you have read the above mentioned post about the the FPE you learned that the predecessors were built around the first, very obsolete and proprietary FPGA, a 555 timer and the 68881 FPU. All these parts would have a Facebook status of ‘#complicated’ today and needed to be replaced.

    Logic: The Xilinx XC2064 FPGA was replaced by something more recent like my universal 5V-tolerant weapon of choice, the Altera EPM3064 (aka MAX3000).
    That little fellow has enough logical gates and using the 100-pin version sufficient I/O-pins are available even when using ISP. The timer for
    blinking the busy-LED went into this, too.

    FPU: There are still some 68881 around, but the 68882 is much easier to find and both are cheaper in PLCC packaging than ceramic PGAs as of today. But as future NC-R owners might already own one or the other, so we’ll go with… both. Yes – to offer maximum flexibility, you can use either Pin-Grid-Array or PLCC packages.
    Physical differences aside, the original FPE/NC did not work with the 68882
    – the NumberCruncher Reloaded does.

    To sum it up the NumberCruncher Reloaded was improved in many aspects to
    make it much more usable in the 21st century:

    it also supports the enhanced and the easier to find MC68882
    FPU’s can be used either in pin-grid-array or PLCC package thanks to the two sockets provided. Again, the latter being much more common these days
    Further increased stability by using low-power SMD parts and a 4-layer PCB
    with dedicated supply layers
    Speed optimized FPU protocol handling
    2 more LEDs, which I consider very important.
    Updatable firmware (ALTERA ByteBlaster required)
    Software

    In contrast to my T2A2 Transputer Link-Adapter, there is already some
    software available for the NC-R.

    The Tools Disk

    This is a good start but was mainly intended for that warm fuzzy feeling of unboxing a real product

    Download: 2MG image or ShrinkIt image

    Still, based on the original Innovative Systems disk (updated to the latest releases), it provides everything you need to start:

    All Apple IIGS-related software is located in the FPE.IIGS folder.
    All Apple II/II+/IIe-related software is located in the FPE.6502 folder.
    The Appleworks 2.x modification software is located in the APPPLEWORKS.FPE8 folder.
    In the FPE.IIGS folder:

    In the FPETOOLS.INIT folder you’ll find the FPE tool set named FPE.INIT.Sx . Be sure to delete any existing SANE.INIT.x file.
    The EXAMPLE folder contains an assembly language file which demonstrates the use of NC-R register-to-register operations to significantly improve
    floating point operations speed.
    The BENCHMARK folder contains an ORCA/PASCAL version of the SAVAGE benchmark and an APW C version of the Byte Magazine floating point co-processor benchmark. (The program in example is an assembly language adaptation of the co-processor benchmark).
    The APW.ORCA.FPE16 , MERLIN.FPEand LISA816 folders contain macros and
    equates files for use in assembly language programming.
    The FPETOOLS.INIT requires some extra explanation because it is a very
    elegant solution to put the NC-R to use:

    This init redirects every Standard Apple Numerics Environment™ (SANE) floating-point call to the NumberCruncher Reloaded – so as long an application using the SANE library calls it will be accelerated. All you
    need is to copy the INIT corresponding to the slot your NC-R is installed
    to…



    …and you’re done!

    As said in the FPE.6502 folder you will find all tools for the Apple II.

    That also has a SANE patch which can be found in TOOLSET:FPE8.TOOLSET. This toolset uses the following calls:

    jsr $2100 ; to call the fp6502 routines
    jsr $2104 ; to call the ELEMS6502 routines
    It loads into locations beginning at $(00)2100 and has a length of less than $1000 bytes.

    I have included the APW.ORCA.FPE8 and MERLIN8.FPE macro folders.

    The APPLEWORKS.FPE8 folder contains the Appleworks 2.x modification. To
    modify your copy of Appleworks, just run FPE.SYSTEM from the root folder and answer the questions. The modified code will automatically access the FPE whenever a floating point operation is required.

    In the FPEfractal folder you will find Zoombaya (and other fractal programs, all by Glen Brendon).
    This is my currently favorite tool for benchmarking and testing.
    It’s written in Applesoft BASIC(!) and uses Glens cool so-called ProCMD module which sets up an interface between Applesoft programs. The downside
    is, that it uses some 65816/65802 specific commands. So running it on a 6502 CPU will lead to a crash.



    “Real” Programs

    I also prepared an archive, containing all programs (I was able to find) supporting an FPE card in some manner.

    Download: ShrinkIt archive or ZIP file

    Obviously, they’re mainly math packages… and sad but true, they’re all IIgs
    programs. :

    GSnumerics (by Spring Branch Software)

    GSnumerics

    Symbolix (by Henrik Gudat of Bright Software)

    Screen Shot

    jazGraph (by Jason Perez)



    MathGraphics (by Dirk Fröhling)



    saneglue (by Söhnke Behrens)

    From the README: “lsaneglue is a library that contains code to let you call SANE funtions directly from ORCA/C”.
    This lib provides convenient functions like findfpcp() and most calls to floating-point operations.



    FAQ

    Q: Which Apple computers are compatible with the NC-R?
    A: I’ve tested the NC-R in my IIgs and IIe. Those work for sure.
    The original FPE was communicated as being compatible with the II and II+,
    too. I don’t have those machines and while the compatibility is highly possible, it has yet to be proven.

    Q: I’m experiencing crashes and instant lock-ups starting programs which are supposed to use the NC-R
    A: Most likely your software is expecting the FPE/NC-R in another slot.
    For speed sake, most current programs naively supporting an FPU card, expect the card in a certain slot. Especially the SANE INIT.
    So please check if your NC-R is installed in the correct slot and try other programs if they are crashing, too.
    I recommend the Mandelbrot program provided on the NC-R Tools disk. This program scans all slots for a FPE/NC-R by itself.

    Q: What are these LEDs for?

    The green BUSY LED blinks at every access to the FPU.
    The yellow INFO LED doesn’t have a proper job yet. Currently it’s connected to DEVSEL, so you can see it blink very briefly, when your Apple II scans
    its bus.
    The red ERROR LED will be lit when the FPU encounters a so-called ‘protocol violation’, i.e. there’s some problem in the communication between the Apple
    and the 68881/2.
    See page 30 in the manual for more details.
    Q: Can I make the NC-R go faster? What about overclocking?
    A: Not really. See the ‘Benchmark‘ section further down.

    Q: On the pictures of the NC-R I can identify a 40MHz Motorola 68882. Do all NC-R have such fast FPU?
    A: No. I use whatever 68882 is available on the market. Strangely enough, sometimes a 40MHz version is cheaper than say a 16MHz.
    So whichever version is installed in your NC-R, it’ll be fast enough and always clocked at 16MHz anyhow.

    Q: How can I write programs using the NC-R?
    A: This is an extensive question which can’t be answered satisfyingly in an FAQ. Refer to chapter 3 of the manual to learn how the NC-R works internally and how to program it in assembler, C or even Basic.
    But I’m also thinking about a dedicated post about just that matter.

    Q: I wrote a small program to test the NC-R and it’s not really faster than using SANE on my IIgs
    A: There’s a certain amount of calculations need to be done until the NC-R shows its performance. A single addition probably takes longer than it would need on e.g. a stock Transwarp GS because all the communication overhead.
    This dramatically changes if you have lots of floating-point calculations in one stream, optimally using the 68881/2 internal registers.

    Q: Are you writing software for the NC-R
    A: Well, maybe in the future but currently I’m busy with other projects.
    But so much can be revealed: Brutal Deluxe has its NC-R already

    Q: How can I ask you a question / get help / praise / complain / rant about
    the NC-R?
    A: I reactivated my little Forum on this page. Therefore it got its own
    Apple II board.
    This way everybody can participate in your question or complaint… speaking
    of complains: Do not complain that you have to register and that it took so long for the approval! This is a one-man show, there are time-zones and
    despite other rumors, I do have a job

    Q: Why did you chose green for the PCBs colo(u)r?
    A: It’s a remake. So it should at least somewhat resemble the original
    look. Also given there are translucent lid/case options available today I personally don’t like the idea of a motley bunch of green/blue/red/white cards standing out of my Apple II… yeah, I’m old-school

    Benchmark

    Still, we’re all hackers of some sort, so you might ask yourself “What’s the
    fastest option?” or “Can I make it even faster?”.
    Well, while the 68882 is up to 30% faster than the 68881 due to its
    capability to execute commands in parallel this mostly doesn’t matter in the case of the NumberCruncher Reloaded.

    As mentioned in the feature-list, thanks to protocol optimizations the NumberCruncher Reloaded is already a tiny bit (~4%) faster than the original
    at the same clock-speed, but more important is that it “saturates” later. This means while with the FPE/NC you don’t see any improvements beyond 12MHz the NumberCruncher Reloaded still benefits from a faster clock up to 16MHz.

    From there it does not make much sense as the Apple II bus, clocked at 1MHz,
    is the limiting factor here.

    Using the ‘Zoombaya Mandelbrot’ program included on the Tools floppy disk on
    a TransWarp GS @ 10MHz gives these calculation times:

    MHz Original NC NC-R 68882 NC-R 68881 Faster than NC
    11 3:59 3:55 3:55 ~1.5%
    12 3:55 3:45 3:45 ~4%
    16 3:55 3:12 3:12 ~20%
    20 N/A 3:12 N/A N/A
    The speed of your Apple computer very much plays into the equation, too. It handles all the writing/reading and of course the endian conversion.
    Here’s an NC-R clocked at the default 16MHz installed in an Apple IIgs running the same Mandelbrot at different clocking:

    ‘normal’ 1MHz ‘fast’ 2.8MHz TWGS @ 10MHz TWGS @ 12 MHz 8:35 4:39 3:12 3:10
    The TransWarp GS seems to saturate pretty early. Most likely because it’s running the code completely out of its cache.

    Purchase

    Of course you want one now And as long you can read this text, I have some available from the first batch of 35.
    [currently available/non-reserved: 12 cards].


    Ahhh, look at those beauties!
    The NumberCruncher Reloaded comes in a nice, eco-friendly box with a high quality, glossy spiral-bound manual and an 800k ProDos floppy.

    You can choose from 3 different configurations:

    Config 1 “Plug’n’Play”:
    NC-R with just the PLCC socket, a 68882 FPU is already installed. Plug the
    card into your Apple and you’re ready to go: €88.82

    Config 2A “I have a PLCC FPU”:
    NC-R with just the PLCC socket – you provide and install a PLCC 68881 or 68882 yourself: €80.82

    Config 2B “I have a PGA FPU”:
    NC-R with PLCC plus a Pin-Grid-Array (PGA) socket – you provide and install
    a PLCC or PGA 68881 or 68882 yourself €84.82
    (There’s no PGA-only option because the PLCC slot is used for the final function-test)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Heumann@21:1/5 to D Finnigan on Sat Aug 7 22:30:48 2021
    On 2021-08-07 15:13:38 +0000, D Finnigan said:

    Check it out:

    https://www.geekdot.com/numbercruncher-reloaded/

    The NumberCruncher Reloaded is a peripheral card for the Apple II series
    that features a math co-processor, often also called a Floating Point Unit (FPU) which is specialized on, well, floating point calculations. Doing so, it is much, much faster than any 6502 or 65816 CPU ever will be.

    I got one of these, and I think it's pretty neat. The included fractal
    demos by Glen Bredon are a cool example of what it can do. One thing
    to be aware of is that there are some corrupted files on the included
    disk (at least the one I got). The SHK file on the website contains
    good versions of some of them.

    I tried porting a couple short benchmarks to access the card directly.
    These aren't particularly interesting programs, but they do show some
    basic examples of how to program the NC-R card, as well as
    demonstrating that it's a _lot_ faster than SANE. I've made the code
    for them available here:

    https://github.com/sheumann/FPE-NC-Benchmarks

    --
    Stephen Heumann

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael J. Mahon@21:1/5 to Stephen Heumann on Sat Aug 28 20:38:46 2021
    Stephen Heumann <stephen.heumann@gmail.com> wrote:
    On 2021-08-07 15:13:38 +0000, D Finnigan said:

    Check it out:

    https://www.geekdot.com/numbercruncher-reloaded/

    The NumberCruncher Reloaded is a peripheral card for the Apple II series
    that features a math co-processor, often also called a Floating Point Unit >> (FPU) which is specialized on, well, floating point calculations. Doing so, >> it is much, much faster than any 6502 or 65816 CPU ever will be.

    I got one of these, and I think it's pretty neat. The included fractal
    demos by Glen Bredon are a cool example of what it can do. One thing
    to be aware of is that there are some corrupted files on the included
    disk (at least the one I got). The SHK file on the website contains
    good versions of some of them.

    I tried porting a couple short benchmarks to access the card directly.
    These aren't particularly interesting programs, but they do show some
    basic examples of how to program the NC-R card, as well as
    demonstrating that it's a _lot_ faster than SANE. I've made the code
    for them available here:

    https://github.com/sheumann/FPE-NC-Benchmarks


    About a decade ago, I obtained an Innovative Systems Floating-Point Engine
    card to use with my //e. As it turned out, I found it unsuitable for my general use because its use of the /RDY bus line was incompatible with my
    Zip Chip.

    But while I had it, I wrote a Merlin macro package to simplify programming
    its 68881 and an FPBASIC patcher program to allow Applesoft to use the
    card.

    In view of the renewed interest in 68881-based floating-point accelerators, I’ve put my work up on my website for others to examine and extend.

    --
    -michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)