Forum: >>> Magnum BBS <<<

Re: Tracking bug report frequency

From Martin Brown@21:1/5 to Joe Gwinn on Mon Sep 11 09:05:13 2023

On 08/09/2023 22:14, Joe Gwinn wrote:

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

NR = Newton Raphson?

Yes. Although these days my production code actually uses Halley's
method in preference since it is almost the same speed to execute on
modern pipeline CPUs and gives cubic rather than quadratic convergence.

Interesting. I had not heard of Halley's Method, and cubic
convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!

One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR
(I suspect it is some sort of pipeline stall slowing the others down).

In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that
(at an increased risk of under/over flow in some edge cases)

The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is
most noticeable with the Intel compiler (less so on MSC or GCC).

The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.

But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.

Some practitioners prefer what they call modified NR which is a sort of
hybrid of NR with Halley with slightly better convergence than Halley.
However it involves a sqrt and isn't always better.

Equation (4) in this paper (which includes the derivation) - method
originally due to Laguerre but now seeing a revival in some codes.

<https://academic.oup.com/mnras/article/467/2/1702/2929272>

The gotcha is that although as written with abs() it never actually
fails - a sufficiently bad guess will still give poor answers, but is
much less likely to explode or oscillate in the way that NR does.

So NR is better because it always warns?

No. NR can go completely AWOL far too easily and diverge to infinity or oscillate chaotically between a set of values. Halley's method is the
best of the three for stability, but MNR will get you slightly better
precision if you can be sure your guesses are always close enough.

I'm not convinced that it is worth the cost of a sqrt but with CPUs
improving all the time it is now very close. The latest i5-12600 I have
can do a sqrt in about the same time as it takes to do a divide!

Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would implement a hardware cube root then I would be very happy.

And then there is co-plotting and staring - the Mark I eyeball is a
very good pattern and anomaly detector, especially of concentricity or
linearity.

Looking at the plot of residual errors can be very enlightening. So can
looking at the histogram for many test cases of the error distribution
of solve, test and then measure error in the resulting answer.

It is remarkably easy to home in on any weak spots and edge cases. My
normal reporting rule is printout exceptions whenever the worst case
high or low error is exceeded. Verbose mode shows when it is equalled.

Yes. But I always make plots and stare at them.

I only do that if something catches my attention as potentially odd.

My initial reaction was that it was tested library code so it must be my >>>> problem - until I traced into it and saw how it failed. It gives 8
instead of 16 sig fig in double precision for these pathological data.

That's a good one.

I was a bit surprised. It required an astonishingly exact combination of
factors for it to trigger and one of my real problems hit it square on.

So biased probing paid off here? Or only for diagnosis?

It confirmed that given the right awkward parameters the solver would
fail but also that it was a a very low risk event. Taking an overnight
run just to find a single example using random data.

I just happened to be unlucky in my exact choice of problem. Expansion
around pi/4 was absolutely fine but the expansion around 3pi/4 went
completely haywire even though it differed only in a handful of sign
changes to coefficients. I was utterly convinced it was my user error.

In the 1980s I hit a similar problem, but it wasn't hard to find. In
scaled integer arithmetic (no floating point then), it's very common
to multiply two 32-bit integers together, yielding a 64-bit product,
and then divide that by a 64-bit scaling factor, yielding a 32-bit
quotient. In the Fortran of the day, if one just wrote this as
(a*b)/c, the compiler would truncate the product (a*b) to 32 bits
before dividing by c, yielding a wildly wrong answer, without any
warnings or drama. The solution was define a 64-bit intermediate
variable, and do the operation in two lines: p=a*b, result = p/c.

ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

--
Martin Brown

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phil Hobbs@21:1/5 to Martin Brown on Mon Sep 11 08:36:31 2023

On 2023-09-11 04:05, Martin Brown wrote:

On 08/09/2023 22:14, Joe Gwinn wrote:

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

NR = Newton Raphson?

Yes. Although these days my production code actually uses
Halley's method in preference since it is almost the same speed
to execute on modern pipeline CPUs and gives cubic rather than
quadratic convergence.

Interesting. I had not heard of Halley's Method, and cubic
convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections
provided that the function is sufficiently differentiable. They have
to be computed very carefully with considerable attention to detail!

One curiosity I have seen with them is that for me D_5 is
anomalously fast despite containing D_4 ! On certain problems D5 is
*faster* than NR (I suspect it is some sort of pipeline stall slowing
the others down).

In the past they all got slower because there was a division for
each step down the chain. I have a new computational scheme which
avoids that (at an increased risk of under/over flow in some edge
cases)

The codes all run as expected on simple starting guesses but when
the fancy cubic ones are used they show this peculiar speed
variation. It is most noticeable with the Intel compiler (less so on
MSC or GCC).

The reason these high order methods are fashionable right now is
that the iterative methods require extra function evaluations and for
that you have to crystallise the next value of x1 and recompute
f(x1),f'(x1) etc. Historically they would iterate 3-5 times to solve
it. Having to wait for that new input value is an unavoidable hard
pipeline stall.

But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be
computed.

Some practitioners prefer what they call modified NR which is a
sort of hybrid of NR with Halley with slightly better
convergence than Halley. However it involves a sqrt and isn't
always better.

Equation (4) in this paper (which includes the derivation) -
method originally due to Laguerre but now seeing a revival in
some codes.

<https://academic.oup.com/mnras/article/467/2/1702/2929272>

The gotcha is that although as written with abs() it never
actually fails - a sufficiently bad guess will still give poor
answers, but is much less likely to explode or oscillate in the
way that NR does.

So NR is better because it always warns?

No. NR can go completely AWOL far too easily and diverge to infinity
or oscillate chaotically between a set of values. Halley's method is
the best of the three for stability, but MNR will get you slightly
better precision if you can be sure your guesses are always close
enough.

I'm not convinced that it is worth the cost of a sqrt but with
CPUs improving all the time it is now very close. The latest
i5-12600 I have can do a sqrt in about the same time as it takes
to do a divide!

Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add.
The latest generation of chips stand out in this respect. If only
they would implement a hardware cube root then I would be very
happy.

And then there is co-plotting and staring - the Mark I eyeball
is a very good pattern and anomaly detector, especially of
concentricity or linearity.

Looking at the plot of residual errors can be very enlightening.
So can looking at the histogram for many test cases of the error
distribution of solve, test and then measure error in the
resulting answer.

It is remarkably easy to home in on any weak spots and edge
cases. My normal reporting rule is printout exceptions whenever
the worst case high or low error is exceeded. Verbose mode shows
when it is equalled.

Yes. But I always make plots and stare at them.

I only do that if something catches my attention as potentially odd.

It's been quite awhile since I looked into the problem (and I was never
close to your level of expertise there, Martin). However, this being
Usenet, I am undeterred. ;)

Cubic methods are great as long as computing the second derivative is
cheap. BITD the "Illinois algorithm" and its relatives got a fair
amount of attention (in select circles). They're regula-falsi based,
but with hyperlinear convergence. The best that I know is a Bjorck &
Anderson variant whose efficiency index (basically the average order of convergence per function evaluation) gets up to 8**(1/4) ~ 1.68.
This compares with 1.0 for bisection or vanilla Regula Falsi, and
sqrt(2) for the secant method.

Secant is too squirrelly to use by itself, because of its tendency to
hit a flat spot and jump straight to East Limbo, never to return.
Newton has that problem too, so you have to find some finite interval
where f_1 * f_2 < 0, and reject steps that fall outside it.

Besides that, there are two cases with Mr. Newton: if computing the
derivative is comparatively cheap, as with trig functions, he gets an efficiency index of 2, but if not, he's stuck down at sqrt(2) with
secant. (Sometimes the derivative is harder to compute than the value,
e.g. with special functions or ratios of trig polynomials, in which case
Newton is even worse off.)

On the other hand, if the derivative can be computed without nasty
significance loss due to cancellation, N. is likely to work better at
high precision.

I've nearly always used derivative-free methods, because I'm usually not
too concerned with ultimate accuracy, and the best ones can reach an
efficiency index of very nearly 2.0, which is pretty cool. (Besides, e
I'm too lazy to want to manage all of those derivatives.)

My initial reaction was that it was tested library code so
it must be my problem - until I traced into it and saw how
it failed. It gives 8 instead of 16 sig fig in double
precision for these pathological data.

That's a good one.

I was a bit surprised. It required an astonishingly exact
combination of factors for it to trigger and one of my real
problems hit it square on.

So biased probing paid off here? Or only for diagnosis?

It confirmed that given the right awkward parameters the solver would
fail but also that it was a a very low risk event. Taking an
overnight run just to find a single example using random data.

I just happened to be unlucky in my exact choice of problem.
Expansion around pi/4 was absolutely fine but the expansion around
3pi/4 went completely haywire even though it differed only in a
handful of sign changes to coefficients. I was utterly convinced it
was my user error.

In the 1980s I hit a similar problem, but it wasn't hard to find.
In scaled integer arithmetic (no floating point then), it's very
common to multiply two 32-bit integers together, yielding a 64-bit
product, and then divide that by a 64-bit scaling factor, yielding
a 32-bit quotient. In the Fortran of the day, if one just wrote
this as (a*b)/c, the compiler would truncate the product (a*b) to
32 bits before dividing by c, yielding a wildly wrong answer,
without any warnings or drama. The solution was define a 64-bit
intermediate variable, and do the operation in two lines: p=a*b,
result = p/c.

ISTR there being a library function MULDIV(a,b,c) for that at least
on the platforms where I worked. We had REAL*4, REAL*8 and REAL*16
but I think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late
70's).

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joe Gwinn@21:1/5 to '''newspam'''@nonad.co.uk on Mon Sep 11 16:58:20 2023

On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

On 08/09/2023 22:14, Joe Gwinn wrote:

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

NR = Newton Raphson?

Yes. Although these days my production code actually uses Halley's
method in preference since it is almost the same speed to execute on
modern pipeline CPUs and gives cubic rather than quadratic convergence.

Interesting. I had not heard of Halley's Method, and cubic
convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!

And differentiation amplifies noise.

One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR
(I suspect it is some sort of pipeline stall slowing the others down).

In the old days, one would see a bunch of NOPs in the generated
assembly, but that won't work for this. I would code a test case in
assembly code and time it. Maybe by toggling a wire before and after,
and looking with a scope.

In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that
(at an increased risk of under/over flow in some edge cases)

The old dodge was to multiply by the pre computed reciprocal, if this
could be computed less often than every division.

The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is
most noticeable with the Intel compiler (less so on MSC or GCC).

The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.

Use staggered parallel pipelines?

But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.

Yes.

I used to do this level of optimizing, but in operating-system
kernels. High-level interrupt routines warranted personal instruction-by-instruction hand optimization, with a drawing for each
and every step. It might have been 100 lines of assembly code.
Multiplied by the interrupt rate and overhead of this or that bit of
hardware. Eliminating even one clock cycle was worthwhile.

Some practitioners prefer what they call modified NR which is a sort of
hybrid of NR with Halley with slightly better convergence than Halley.
However it involves a sqrt and isn't always better.

Equation (4) in this paper (which includes the derivation) - method
originally due to Laguerre but now seeing a revival in some codes.

<https://academic.oup.com/mnras/article/467/2/1702/2929272>

The gotcha is that although as written with abs() it never actually
fails - a sufficiently bad guess will still give poor answers, but is
much less likely to explode or oscillate in the way that NR does.

So NR is better because it always warns?

No. NR can go completely AWOL far too easily and diverge to infinity or >oscillate chaotically between a set of values. Halley's method is the
best of the three for stability, but MNR will get you slightly better >precision if you can be sure your guesses are always close enough.

I'm not convinced that it is worth the cost of a sqrt but with CPUs
improving all the time it is now very close. The latest i5-12600 I have
can do a sqrt in about the same time as it takes to do a divide!

Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would >implement a hardware cube root then I would be very happy.

That makes sense, as many have a barrel shifter in hardware.

And then there is co-plotting and staring - the Mark I eyeball is a
very good pattern and anomaly detector, especially of concentricity or >>>> linearity.

Looking at the plot of residual errors can be very enlightening. So can
looking at the histogram for many test cases of the error distribution
of solve, test and then measure error in the resulting answer.

It is remarkably easy to home in on any weak spots and edge cases. My
normal reporting rule is printout exceptions whenever the worst case
high or low error is exceeded. Verbose mode shows when it is equalled.

Yes. But I always make plots and stare at them.

I only do that if something catches my attention as potentially odd.

I always do it, precisely to improve my chances of noticing something
odd.

My initial reaction was that it was tested library code so it must be my >>>>> problem - until I traced into it and saw how it failed. It gives 8
instead of 16 sig fig in double precision for these pathological data. >>>>

That's a good one.

I was a bit surprised. It required an astonishingly exact combination of >>> factors for it to trigger and one of my real problems hit it square on.

So biased probing paid off here? Or only for diagnosis?

It confirmed that given the right awkward parameters the solver would
fail but also that it was a a very low risk event. Taking an overnight
run just to find a single example using random data.

I just happened to be unlucky in my exact choice of problem. Expansion
around pi/4 was absolutely fine but the expansion around 3pi/4 went >completely haywire even though it differed only in a handful of sign
changes to coefficients. I was utterly convinced it was my user error.

In the 1980s I hit a similar problem, but it wasn't hard to find. In
scaled integer arithmetic (no floating point then), it's very common
to multiply two 32-bit integers together, yielding a 64-bit product,
and then divide that by a 64-bit scaling factor, yielding a 32-bit
quotient. In the Fortran of the day, if one just wrote this as
(a*b)/c, the compiler would truncate the product (a*b) to 32 bits
before dividing by c, yielding a wildly wrong answer, without any
warnings or drama. The solution was define a 64-bit intermediate
variable, and do the operation in two lines: p=a*b, result = p/c.

ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

Yes, it either didn't exist yet or didn't cover the specific case we
had.

A parallel story from the late 1970s was while building a C-141
maintenance trainer (mockup cockpit animated by a midi-computer of the
day) in Fortran, and we were forced to�implement partly in assembler, specifically a collection of assembly-coded Fortran functions, to
implement 32-bit bitwise logic functions. When I told the application programmers that they would be writing assembly, there was much
wailing and rending of garments. But I provided a template and
annotated example, and they soon got the hang of it.

Thirty or forty year later, I get a call out of the blue from a
company I'd never heard of. I don't know how they got my name. They
had just won a contract to port the now legacy trainer code to C, and
didn't know what to do with all that assembler. I told him the
reason, and that this had been just before the ISA Extensions had been released, but that all that assembler did was bitwise logic, which
could be done directly in C. The caller was very happy to hear this.

Joe Gwinn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Martin Brown@21:1/5 to Phil Hobbs on Wed Sep 13 13:54:45 2023

On 11/09/2023 13:36, Phil Hobbs wrote:

On 2023-09-11 04:05, Martin Brown wrote:

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections
provided that the function is sufficiently differentiable. They have
to be computed very carefully with considerable attention to detail!

It's been quite awhile since I looked into the problem (and I was never
close to your level of expertise there, Martin). However, this being Usenet, I am undeterred. ;)

Cubic methods are great as long as computing the second derivative is cheap. BITD the "Illinois algorithm" and its relatives got a fair
amount of attention (in select circles). They're regula-falsi based,
but with hyperlinear convergence. The best that I know is a Bjorck & Anderson variant whose efficiency index (basically the average order of convergence per function evaluation) gets up to 8**(1/4) ~ 1.68.
This compares with 1.0 for bisection or vanilla Regula Falsi, and
sqrt(2) for the secant method.

That's a blast from the past. It used to be popular in some of these
problems until a way was found to tame potential divergence (basically
by making sure that the starting guess is always within the range where
the sequence is absolutely convergent). Some analytical sleight of hand
is needed to get it spot on or there will be some insanely small but
still non-zero value where it goes crazy as eccentricity e -> 1.

There has been a recent paper using bisection but it has various flaws -
not least that it determines an accurate answer to the wrong question.

Secant is too squirrelly to use by itself, because of its tendency to
hit a flat spot and jump straight to East Limbo, never to return.
Newton has that problem too, so you have to find some finite interval
where f_1 * f_2 < 0, and reject steps that fall outside it.

One of the cute features of Kepler's problem E = M + e*sin(E) is that
one of the simplest naive upper bounds of E_0 = M+e is absolutely
convergent so if you clip on that you remain numerically stable.

Besides that, there are two cases with Mr. Newton: if computing the derivative is comparatively cheap, as with trig functions, he gets an efficiency index of 2, but if not, he's stuck down at sqrt(2) with
secant. (Sometimes the derivative is harder to compute than the value,
e.g. with special functions or ratios of trig polynomials, in which case Newton is even worse off.)

The trend at the moment is to avoid system trig functions and have 5th
order polynomial expansions around a table of equispaced fixed sin, cos
values. Right now the trick for speed is avoid divides where possible.

On the other hand, if the derivative can be computed without nasty significance loss due to cancellation, N. is likely to work better at
high precision.

We always have analytical derivatives available (although they have some unpleasant properties in the singular corner).

I've nearly always used derivative-free methods, because I'm usually not
too concerned with ultimate accuracy, and the best ones can reach an efficiency index of very nearly 2.0, which is pretty cool. (Besides, e
I'm too lazy to want to manage all of those derivatives.)

If I only have the function available numerically as a black box and no analytical derivative I tend to favour Golden ratio search for a maximum
and then take a punt on computing the derivative numerically and seeing
where NR actually lands. It either polishes the answer nicely or goes
AWOL. A few extra function evaluations is usually neither here nor there.

I did some work a long time ago on computing accurate numerical
derivatives from equally spaced function evaluations. It still comes in
handy sometimes (a bit of a lost art from the old days of manual
computors sat at desks).

What surprised me most was that a flaw like that could be lurking in
almost every cubic solver (I bet NAGlib got it right) just waiting for
the right set of unplayable data to come along.

--
Martin Brown

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phil Hobbs@21:1/5 to Martin Brown on Wed Sep 13 15:24:53 2023

Martin Brown <'''newspam'''@nonad.co.uk> wrote:

On 11/09/2023 13:36, Phil Hobbs wrote:

On 2023-09-11 04:05, Martin Brown wrote:

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections
provided that the function is sufficiently differentiable. They have
to be computed very carefully with considerable attention to detail!

It's been quite awhile since I looked into the problem (and I was never
close to your level of expertise there, Martin). However, this being
Usenet, I am undeterred. ;)

Cubic methods are great as long as computing the second derivative is
cheap. BITD the "Illinois algorithm" and its relatives got a fair
amount of attention (in select circles). They're regula-falsi based,
but with hyperlinear convergence. The best that I know is a Bjorck &
Anderson variant whose efficiency index (basically the average order of
convergence per function evaluation) gets up to 8**(1/4) ~ 1.68.
This compares with 1.0 for bisection or vanilla Regula Falsi, and
sqrt(2) for the secant method.

That's a blast from the past. It used to be popular in some of these
problems until a way was found to tame potential divergence (basically
by making sure that the starting guess is always within the range where
the sequence is absolutely convergent). Some analytical sleight of hand
is needed to get it spot on or there will be some insanely small but
still non-zero value where it goes crazy as eccentricity e -> 1.

There has been a recent paper using bisection but it has various flaws -
not least that it determines an accurate answer to the wrong question.

Could be a problem.

Secant is too squirrelly to use by itself, because of its tendency to
hit a flat spot and jump straight to East Limbo, never to return.
Newton has that problem too, so you have to find some finite interval
where f_1 * f_2 < 0, and reject steps that fall outside it.

One of the cute features of Kepler's problem E = M + e*sin(E) is that
one of the simplest naive upper bounds of E_0 = M+e is absolutely
convergent so if you clip on that you remain numerically stable.

Fun. Last time I used Kepler’s equation was on a problem set in about
1980. I recall being too lazy to use Newton, so since it was a planetary
orbit with epsilon <<1, I just rearranged terms and iterated on epsilon.
(Prof Ovenden was obviously bored with grading Newtonian solutions, because
he gave me an extra point.) ;)

The best thing about it is the names—M is “mean anomaly” and E is “eccentric anomaly”.

Besides that, there are two cases with Mr. Newton: if computing the
derivative is comparatively cheap, as with trig functions, he gets an
efficiency index of 2, but if not, he's stuck down at sqrt(2) with
secant. (Sometimes the derivative is harder to compute than the value,
e.g. with special functions or ratios of trig polynomials, in which case
Newton is even worse off.)

The trend at the moment is to avoid system trig functions and have 5th
order polynomial expansions around a table of equispaced fixed sin, cos values. Right now the trick for speed is avoid divides where possible.

On the other hand, if the derivative can be computed without nasty
significance loss due to cancellation, N. is likely to work better at
high precision.

We always have analytical derivatives available (although they have some unpleasant properties in the singular corner).

I've nearly always used derivative-free methods, because I'm usually not
too concerned with ultimate accuracy, and the best ones can reach an
efficiency index of very nearly 2.0, which is pretty cool. (Besides, e
I'm too lazy to want to manage all of those derivatives.)

If I only have the function available numerically as a black box and no analytical derivative I tend to favour Golden ratio search for a maximum
and then take a punt on computing the derivative numerically and seeing
where NR actually lands. It either polishes the answer nicely or goes
AWOL. A few extra function evaluations is usually neither here nor there.

I did some work a long time ago on computing accurate numerical
derivatives from equally spaced function evaluations. It still comes in
handy sometimes (a bit of a lost art from the old days of manual
computors sat at desks).

Long ago I did one of those on an HP41C, along the way to an Adams-Bashforth-Moulton predictor-corrector ODE solver. (It’s easy to remember which is which—“Bashforth” is an excellent name for an extrapolator.) ;)

What surprised me most was that a flaw like that could be lurking in
almost every cubic solver (I bet NAGlib got it right) just waiting for
the right set of unplayable data to come along.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC / Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Martin Brown@21:1/5 to Joe Gwinn on Fri Sep 15 10:55:34 2023

On 11/09/2023 21:58, Joe Gwinn wrote:

On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

On 08/09/2023 22:14, Joe Gwinn wrote:

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

NR = Newton Raphson?

Yes. Although these days my production code actually uses Halley's
method in preference since it is almost the same speed to execute on
modern pipeline CPUs and gives cubic rather than quadratic convergence. >>>

Interesting. I had not heard of Halley's Method, and cubic
convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!

And differentiation amplifies noise.

Symbolic differentiation doesn't. We have analytical expressions for the problem to solve and most of them are differentiable several times apart
from where there are poles. Those regions need extra care.

One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR
(I suspect it is some sort of pipeline stall slowing the others down).

In the old days, one would see a bunch of NOPs in the generated
assembly, but that won't work for this. I would code a test case in
assembly code and time it. Maybe by toggling a wire before and after,
and looking with a scope.

I tend to use RDTSC although it is deprecated for this sort of thing it
works fairly well. You just have to be aware of its limitations. The
ultimate way to do it is run a shim that lets you access the machine
specific registers and directly measure pipeline stalls and other modes
of failure. I haven't done that for this code (yet).

In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that
(at an increased risk of under/over flow in some edge cases)

The old dodge was to multiply by the pre computed reciprocal, if this
could be computed less often than every division.

These days you can pretty much rely on any decent compiler to precompute
static constants into whatever form is best. The strength reduction
rules work well. You can write x/3.0 confident in the knowledge that it
will compile to x*(1.0/3) with the 1.0/3 computed at compile time.
(although it is as well to check this on new compilers)

Even if you do something like several expressions in a row /x most of
the latest compilers will compute 1/x and then multiply it out. Trying
to micromanage such things can actually make it worse since the compiler
might store the intermediate value in memory if you code it explicitly.

The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is
most noticeable with the Intel compiler (less so on MSC or GCC).

The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.

Use staggered parallel pipelines?

There is something about the odd numbered differences that makes them
unusually fast.

But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.

Yes.

It is only being done because of the requirement for speed and accuracy
at the same time. Every cycle counts. Someone is even working on a
dedicated hardware solution using a variant of CORDIC for this problem.

https://arxiv.org/abs/2008.02894

They are getting quite close to having working hardware now.

I'm not convinced that it is worth the cost of a sqrt but with CPUs
improving all the time it is now very close. The latest i5-12600 I have >>>> can do a sqrt in about the same time as it takes to do a divide!

Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would
implement a hardware cube root then I would be very happy.

That makes sense, as many have a barrel shifter in hardware.

ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

Yes, it either didn't exist yet or didn't cover the specific case we
had.

I'm pretty sure it was in one of the locally written system libraries. I
also had a routine to mask out certain harmless underflows in an
astrophysics FLIC code. It was scaled in SI units which were something
like h/c^2 which didn't leave a lot of room in 32bit floats.

The original code spent most of its time in underflow recovery routines.
I got an immediate 5x speed improvement with no loss of accuracy with
just a few lines of assembler added to the project.

Eventually the problem was rescaled to put things into the middle ground
so that underflows were much less likely.

--
Martin Brown

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joe Gwinn@21:1/5 to '''newspam'''@nonad.co.uk on Fri Sep 15 13:57:25 2023

On Fri, 15 Sep 2023 10:55:34 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

On 11/09/2023 21:58, Joe Gwinn wrote:

On Mon, 11 Sep 2023 09:05:13 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

On 08/09/2023 22:14, Joe Gwinn wrote:

On Fri, 8 Sep 2023 09:33:33 +0100, Martin Brown
<'''newspam'''@nonad.co.uk> wrote:

NR = Newton Raphson?

Yes. Although these days my production code actually uses Halley's
method in preference since it is almost the same speed to execute on >>>>> modern pipeline CPUs and gives cubic rather than quadratic convergence. >>>>

Interesting. I had not heard of Halley's Method, and cubic
convergence is worthwhile.

Indeed. Halley was pretty clever but forever overshadowed by Newton.

Some of the latest codes use Delta_4 and Delta_5 corrections provided
that the function is sufficiently differentiable. They have to be
computed very carefully with considerable attention to detail!

And differentiation amplifies noise.

Symbolic differentiation doesn't. We have analytical expressions for the >problem to solve and most of them are differentiable several times apart
from where there are poles. Those regions need extra care.

One curiosity I have seen with them is that for me D_5 is anomalously
fast despite containing D_4 ! On certain problems D5 is *faster* than NR >>> (I suspect it is some sort of pipeline stall slowing the others down).

In the old days, one would see a bunch of NOPs in the generated
assembly, but that won't work for this. I would code a test case in
assembly code and time it. Maybe by toggling a wire before and after,
and looking with a scope.

I tend to use RDTSC although it is deprecated for this sort of thing it
works fairly well. You just have to be aware of its limitations. The
ultimate way to do it is run a shim that lets you access the machine
specific registers and directly measure pipeline stalls and other modes
of failure. I haven't done that for this code (yet).

Well, the old EE's approach always involves an oscilloscope and an
electrical signal from the device under test, simple and physical, so
it cannot so easily lie. Well, before the advent of digital scopes,
but still...

In the past they all got slower because there was a division for each
step down the chain. I have a new computational scheme which avoids that >>> (at an increased risk of under/over flow in some edge cases)

The old dodge was to multiply by the pre computed reciprocal, if this
could be computed less often than every division.

These days you can pretty much rely on any decent compiler to precompute >static constants into whatever form is best. The strength reduction
rules work well. You can write x/3.0 confident in the knowledge that it
will compile to x*(1.0/3) with the 1.0/3 computed at compile time.
(although it is as well to check this on new compilers)

Even if you do something like several expressions in a row /x most of
the latest compilers will compute 1/x and then multiply it out. Trying
to micromanage such things can actually make it worse since the compiler >might store the intermediate value in memory if you code it explicitly.

Yes.

The codes all run as expected on simple starting guesses but when the
fancy cubic ones are used they show this peculiar speed variation. It is >>> most noticeable with the Intel compiler (less so on MSC or GCC).

The reason these high order methods are fashionable right now is that
the iterative methods require extra function evaluations and for that
you have to crystallise the next value of x1 and recompute f(x1),f'(x1)
etc. Historically they would iterate 3-5 times to solve it. Having to
wait for that new input value is an unavoidable hard pipeline stall.

Use staggered parallel pipelines?

There is something about the odd numbered differences that makes them >unusually fast.

I'd be looking at hardware details like how this all maps onto
physical memory, especially cache lines and the like.

But if you can get the initial guess with a relative error below
10^(-16/N) then in an ideal world the Nth order difference corrector
gets the answer in a single step and so the next value can be computed.

Yes.

It is only being done because of the requirement for speed and accuracy
at the same time. Every cycle counts. Someone is even working on a
dedicated hardware solution using a variant of CORDIC for this problem.

<https://arxiv.org/abs/2008.02894>

They are getting quite close to having working hardware now.

I've seen some CORDIC variations, typically for ellipses, but not this
one.

I'm not convinced that it is worth the cost of a sqrt but with CPUs
improving all the time it is now very close. The latest i5-12600 I have >>>>> can do a sqrt in about the same time as it takes to do a divide!

Wonder what algorithm it uses?

My guess is some sort of cunning hardware shift, multiply and add. The
latest generation of chips stand out in this respect. If only they would >>> implement a hardware cube root then I would be very happy.

That makes sense, as many have a barrel shifter in hardware.

ISTR there being a library function MULDIV(a,b,c) for that at least on
the platforms where I worked. We had REAL*4, REAL*8 and REAL*16 but I
think only INTEGER*1, INTEGER*2 and INTEGER*4 available (late 70's).

Yes, it either didn't exist yet or didn't cover the specific case we
had.

I'm pretty sure it was in one of the locally written system libraries. I
also had a routine to mask out certain harmless underflows in an
astrophysics FLIC code. It was scaled in SI units which were something
like h/c^2 which didn't leave a lot of room in 32bit floats.

Sounds like Planck Scale?

The original code spent most of its time in underflow recovery routines.
I got an immediate 5x speed improvement with no loss of accuracy with
just a few lines of assembler added to the project.

Been there done that.

One standard tool is a sampler tied to a hardware timer interrupt
programmed for random intervals between interrupts, collecting PC
addresses. The "random interval" must at least be mutually prime with
respect to all periodic processes in the system under test.

The histogram of PC addresses shows where the program was spending its
time, often in a place nobody ever heard of, usually deep in some
obscure part of the runtime system. Like underflow handling.

Eventually the problem was rescaled to put things into the middle ground
so that underflows were much less likely.

What I've seen used a lot is straight SI units with double precision
(64-bit) IEEE floating point arithmetic, with all conversions to and
from legacy units performed only at the interfaces.

Joe Gwinn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Brianm
  Wed May 8 22:25:33 2024
  from Glasgow via Raw
- Michal Wronka
  Wed May 8 21:31:48 2024
  from Wroclaw, Poland via SSH
- Cronus
  Wed May 8 19:22:39 2024
  from Provo, Ut via SSH
- Bob Worm
  Thu May 9 07:33:41 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (3 / 13)
Uptime:	84:33:23
Calls:	6,717
Calls today:	1
Files:	12,247
Messages:	5,358,258
Posted today:	1

Re: Tracking bug report frequency

Who's Online

Recent Visitors

System Info