• LTspice speed

    From dalai lamah@21:1/5 to All on Thu Sep 21 14:22:38 2023
    As you probably know, in many occasions LTspice cannot take advantage of multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 cores/8 threads).

    However, running more processes of LTspice to execute different simulations
    at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 minutes.

    I've already tried to manually fiddle with Task Manager and the processor affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something else?

    --
    Fletto i muscoli e sono nel vuoto.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Larkin@21:1/5 to antonio12358@hotmail.com on Thu Sep 21 07:42:40 2023
    On Thu, 21 Sep 2023 14:22:38 +0200, dalai lamah
    <antonio12358@hotmail.com> wrote:

    As you probably know, in many occasions LTspice cannot take advantage of >multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >cores/8 threads).

    However, running more processes of LTspice to execute different simulations >at the same time should overcome this limitation: each simulation is >distinct, they can be fully paralleled. If I run two simulations that >individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 >minutes.

    I've already tried to manually fiddle with Task Manager and the processor >affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something >else?

    In theory, a sim could be broken into a bunch of small subsystems
    connected by a few wires, and each would run faster. Small matrix on a dedicated CPU.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to dalai lamah on Thu Sep 21 07:58:37 2023
    On 9/21/2023 5:22 AM, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 cores/8 threads).

    However, running more processes of LTspice to execute different simulations at the same time should overcome this limitation: each simulation is distinct, they can be fully paralleled. If I run two simulations that individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 minutes.

    I've already tried to manually fiddle with Task Manager and the processor affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something else?

    My bet: each sim is causing the other's data to be evicted from the cache.

    If you could disable the cache completely, you could benchmark 1 vs. 2
    and verify this.

    [Or, you have way too little RAM and the machine is thrashing -- but, you
    would likely notice that]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dalai lamah@21:1/5 to All on Thu Sep 21 17:22:10 2023
    Un bel giorno Don Y digitò:

    As you probably know, in many occasions LTspice cannot take advantage of
    multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4
    cores/8 threads).

    However, running more processes of LTspice to execute different simulations >> at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe >> a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20
    minutes.

    I've already tried to manually fiddle with Task Manager and the processor
    affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something >> else?

    My bet: each sim is causing the other's data to be evicted from the cache.

    Yes, I think this is it: cache misses and probably also I/O overhead. In absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s)
    but the I/O operations are in the millions.

    Moreover, I've just noticed that every LTspice process uses a lot of
    threads, even if you limit the "max threads" parameter from the LTspice
    control panel. At least ten. Right now I'm running three simulations at
    once, and in total there are 46 LTspice threads running...

    I think that LTspice is quite similar to AAA games: the number of cores
    does not matter much, and clock speed is king.

    --
    Fletto i muscoli e sono nel vuoto.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Larkin@21:1/5 to antonio12358@hotmail.com on Thu Sep 21 08:59:55 2023
    On Thu, 21 Sep 2023 17:22:10 +0200, dalai lamah
    <antonio12358@hotmail.com> wrote:

    Un bel giorno Don Y digitò:

    As you probably know, in many occasions LTspice cannot take advantage of >>> multiple CPU cores because many operations are not easily parallelizable. >>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>> cores/8 threads).

    However, running more processes of LTspice to execute different simulations >>> at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a >>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >>> a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 >>> minutes.

    I've already tried to manually fiddle with Task Manager and the processor >>> affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something >>> else?

    My bet: each sim is causing the other's data to be evicted from the cache.

    Yes, I think this is it: cache misses and probably also I/O overhead. In >absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) >but the I/O operations are in the millions.

    Moreover, I've just noticed that every LTspice process uses a lot of
    threads, even if you limit the "max threads" parameter from the LTspice >control panel. At least ten. Right now I'm running three simulations at
    once, and in total there are 46 LTspice threads running...

    I think that LTspice is quite similar to AAA games: the number of cores
    does not matter much, and clock speed is king.

    A biggish circuit generates gigabytes of .RAW file and can bog down a
    slow hard drive. SS drives help, as does limiting the data that is
    saved.

    .SAVE has the disadvantage that you can't freely probe after the sim
    is done. .SAVE V(*) will save only voltages.

    LT Spice doesn't allow a fixed or minimum time step, does it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dalai lamah@21:1/5 to All on Thu Sep 21 19:20:55 2023
    Un bel giorno John Larkin digitò:

    As you probably know, in many occasions LTspice cannot take advantage of >>>> multiple CPU cores because many operations are not easily parallelizable. >>>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>>> cores/8 threads).

    However, running more processes of LTspice to execute different simulations
    at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a >>>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>>> both simulations would take almost exactly twice as much to complete, 20 >>>> minutes.

    I've already tried to manually fiddle with Task Manager and the processor >>>> affinities, for example assigning two cores to a process and two other >>>> cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something
    else?

    My bet: each sim is causing the other's data to be evicted from the cache. >>
    Yes, I think this is it: cache misses and probably also I/O overhead. In >>absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) >>but the I/O operations are in the millions.

    Moreover, I've just noticed that every LTspice process uses a lot of >>threads, even if you limit the "max threads" parameter from the LTspice >>control panel. At least ten. Right now I'm running three simulations at >>once, and in total there are 46 LTspice threads running...

    I think that LTspice is quite similar to AAA games: the number of cores >>does not matter much, and clock speed is king.

    A biggish circuit generates gigabytes of .RAW file and can bog down a
    slow hard drive. SS drives help, as does limiting the data that is
    saved.

    Yes, I have a SSD and each RAW file grows around 15 GB. Unfortunately I
    need all the data and also some precision; I've set the maximum timestep to
    10 ns, it's still slightly inadequate, but I need the simulations to end
    within a day. :)

    .SAVE has the disadvantage that you can't freely probe after the sim
    is done. .SAVE V(*) will save only voltages.

    LT Spice doesn't allow a fixed or minimum time step, does it?

    There would be the spice option "dtmin", but I don't know if LTspice
    supports it. I've never tried it.

    --
    Fletto i muscoli e sono nel vuoto.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bitrex@21:1/5 to Martin Brown on Thu Sep 21 14:04:29 2023
    On 9/21/2023 1:31 PM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of
    multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4
    cores/8 threads).

    Even with code that is optimised for multiprocessor operation like chess engines a rule of thumb is that about 75% of fast cores running flat out
    you saturate memory bandwidth and so allowing more than 6 cores out of 8
    to run merely increases power consumption and may even slow down the computation. Chess is even more insidious in that certain pruning
    techniques don't lend themselves to parallelism so you lose both ways.

    However, running more processes of LTspice to execute different
    simulations
    at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete.
    Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20
    minutes.

    The computation is almost certainly memory constrained. The matrix
    solver needs to have plenty of cache to solve the sparse equations and
    is likely making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell
    for the available resources. I expect LT Spice is very cache aware even
    if it is only single processor friendly.

    What about disk access? AFAIK an LTSpice instance by default saves its
    work to disk as it goes along, see e.g.

    <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Brown@21:1/5 to dalai lamah on Thu Sep 21 18:31:13 2023
    On 21/09/2023 13:22, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 cores/8 threads).

    Even with code that is optimised for multiprocessor operation like chess engines a rule of thumb is that about 75% of fast cores running flat out
    you saturate memory bandwidth and so allowing more than 6 cores out of 8
    to run merely increases power consumption and may even slow down the computation. Chess is even more insidious in that certain pruning
    techniques don't lend themselves to parallelism so you lose both ways.

    However, running more processes of LTspice to execute different simulations at the same time should overcome this limitation: each simulation is distinct, they can be fully paralleled. If I run two simulations that individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 minutes.

    The computation is almost certainly memory constrained. The matrix
    solver needs to have plenty of cache to solve the sparse equations and
    is likely making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell
    for the available resources. I expect LT Spice is very cache aware even
    if it is only single processor friendly.

    I've already tried to manually fiddle with Task Manager and the processor affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something else?

    Try looking at resource manager and I expect you will find memory access
    pegged to the maximum. I'm pretty sure it would be the same on any OS.


    --
    Martin Brown

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to dalai lamah on Thu Sep 21 11:29:37 2023
    On 9/21/2023 8:22 AM, dalai lamah wrote:
    Un bel giorno Don Y digitò:

    As you probably know, in many occasions LTspice cannot take advantage of >>> multiple CPU cores because many operations are not easily parallelizable. >>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>> cores/8 threads).

    However, running more processes of LTspice to execute different simulations >>> at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a >>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe >>> a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 >>> minutes.

    I've already tried to manually fiddle with Task Manager and the processor >>> affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something >>> else?

    My bet: each sim is causing the other's data to be evicted from the cache.

    Yes, I think this is it: cache misses and probably also I/O overhead. In absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) but the I/O operations are in the millions.

    Unless it's flushing the buffers to disk after EVERY write, that's
    just code-like-any-other-code (i.e., with infinite cache, would
    speed up just like any other).

    Moreover, I've just noticed that every LTspice process uses a lot of
    threads, even if you limit the "max threads" parameter from the LTspice control panel. At least ten. Right now I'm running three simulations at
    once, and in total there are 46 LTspice threads running...

    Same as above.

    What you are looking for is some "scarce resource" that both
    processes want and has a fixed bandwidth available -- the
    disk (*if* it was being hammered) or cache are the two that
    come to mind.

    [My bet on the cache because spice is lousy for locality of
    data references]

    I think that LTspice is quite similar to AAA games: the number of cores
    does not matter much, and clock speed is king.

    I wonder why it's not been ported to a GPU; that seems
    the obvious migration path (not for the parallelism as much
    as the raw throughput)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Martin Brown on Thu Sep 21 11:33:58 2023
    On 9/21/2023 10:31 AM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of
    multiple CPU cores because many operations are not easily parallelizable.
    In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4
    cores/8 threads).

    Even with code that is optimised for multiprocessor operation like chess engines a rule of thumb is that about 75% of fast cores running flat out you saturate memory bandwidth and so allowing more than 6 cores out of 8 to run merely increases power consumption and may even slow down the computation. Chess is even more insidious in that certain pruning techniques don't lend themselves to parallelism so you lose both ways.

    Didn't Amdahl predict 5X for 8 cores? For well-behaved loads?

    However, running more processes of LTspice to execute different simulations >> at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a
    40% CPU occupation but they still should take 10 minutes to complete. Maybe >> a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20
    minutes.

    The computation is almost certainly memory constrained. The matrix solver needs
    to have plenty of cache to solve the sparse equations and is likely making assumptions about cache lines remaining in cache.

    Exactly. It wants to *eat* all of the cache -- as does it's sister
    process.

    I suspect turning off the cache and measuring execution time of
    *1* and then 2 processes would be enlightening.

    Amusing that even the large caches that are now available
    are still not large enough for ALL applications. You get spoiled
    seeing the speedup on nominal problems and are surprised when
    that doesn't generalize!

    Two processes trying to do the same sort of thing will fight like hell for the
    available resources. I expect LT Spice is very cache aware even if it is only single processor friendly.

    I've already tried to manually fiddle with Task Manager and the processor
    affinities, for example assigning two cores to a process and two other
    cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something >> else?

    Try looking at resource manager and I expect you will find memory access pegged
    to the maximum. I'm pretty sure it would be the same on any OS.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Larkin@21:1/5 to antonio12358@hotmail.com on Thu Sep 21 11:35:08 2023
    On Thu, 21 Sep 2023 19:20:55 +0200, dalai lamah
    <antonio12358@hotmail.com> wrote:

    Un bel giorno John Larkin digitò:

    As you probably know, in many occasions LTspice cannot take advantage of >>>>> multiple CPU cores because many operations are not easily parallelizable. >>>>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>>>> cores/8 threads).

    However, running more processes of LTspice to execute different simulations
    at the same time should overcome this limitation: each simulation is >>>>> distinct, they can be fully paralleled. If I run two simulations that >>>>> individually would use the 20% of CPU and last 10 minutes, I should see a >>>>> 40% CPU occupation but they still should take 10 minutes to complete. Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>>>> both simulations would take almost exactly twice as much to complete, 20 >>>>> minutes.

    I've already tried to manually fiddle with Task Manager and the processor >>>>> affinities, for example assigning two cores to a process and two other >>>>> cores to the other process. No difference.

    Why? Is this some crappy Windows scheduler behavior, or do I miss something
    else?

    My bet: each sim is causing the other's data to be evicted from the cache.

    Yes, I think this is it: cache misses and probably also I/O overhead. In >>>absolute terms the disk write speed is moderate (not more than 1 or 2 MB/s) >>>but the I/O operations are in the millions.

    Moreover, I've just noticed that every LTspice process uses a lot of >>>threads, even if you limit the "max threads" parameter from the LTspice >>>control panel. At least ten. Right now I'm running three simulations at >>>once, and in total there are 46 LTspice threads running...

    I think that LTspice is quite similar to AAA games: the number of cores >>>does not matter much, and clock speed is king.

    A biggish circuit generates gigabytes of .RAW file and can bog down a
    slow hard drive. SS drives help, as does limiting the data that is
    saved.

    Yes, I have a SSD and each RAW file grows around 15 GB. Unfortunately I
    need all the data and also some precision; I've set the maximum timestep to >10 ns, it's still slightly inadequate, but I need the simulations to end >within a day. :)

    Yikes. I whine about 20 minute sims. Humans learn from rapid feedback,
    and even 20 minutes is too slow.


    .SAVE has the disadvantage that you can't freely probe after the sim
    is done. .SAVE V(*) will save only voltages.

    LT Spice doesn't allow a fixed or minimum time step, does it?

    There would be the spice option "dtmin", but I don't know if LTspice
    supports it. I've never tried it.

    It doesn't seem to allow a min time step.

    If we make a product with 1% or 5% parts, we don't need PPB sim
    accuracy, so a bigger time step could make sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Larkin@21:1/5 to bitrex on Thu Sep 21 11:39:00 2023
    On Thu, 21 Sep 2023 14:04:29 -0400, bitrex <user@example.net> wrote:

    On 9/21/2023 1:31 PM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of >>> multiple CPU cores because many operations are not easily parallelizable. >>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>> cores/8 threads).

    Even with code that is optimised for multiprocessor operation like chess
    engines a rule of thumb is that about 75% of fast cores running flat out
    you saturate memory bandwidth and so allowing more than 6 cores out of 8
    to run merely increases power consumption and may even slow down the
    computation. Chess is even more insidious in that certain pruning
    techniques don't lend themselves to parallelism so you lose both ways.

    However, running more processes of LTspice to execute different
    simulations
    at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a >>> 40% CPU occupation but they still should take 10 minutes to complete.
    Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 >>> minutes.

    The computation is almost certainly memory constrained. The matrix
    solver needs to have plenty of cache to solve the sparse equations and
    is likely making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell
    for the available resources. I expect LT Spice is very cache aware even
    if it is only single processor friendly.

    What about disk access? AFAIK an LTSpice instance by default saves its
    work to disk as it goes along, see e.g.

    <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>


    I have seen .save, limiting disk access, double sim speed. But then
    you can't freely probe the results, or calculate power dissipation,
    unless you plan that in advance.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bitrex@21:1/5 to John Larkin on Thu Sep 21 15:09:16 2023
    On 9/21/2023 2:39 PM, John Larkin wrote:
    On Thu, 21 Sep 2023 14:04:29 -0400, bitrex <user@example.net> wrote:

    On 9/21/2023 1:31 PM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:
    As you probably know, in many occasions LTspice cannot take advantage of >>>> multiple CPU cores because many operations are not easily parallelizable. >>>> In fact, most simulations I make use less than 20/25% of CPU (intel i5, 4 >>>> cores/8 threads).

    Even with code that is optimised for multiprocessor operation like chess >>> engines a rule of thumb is that about 75% of fast cores running flat out >>> you saturate memory bandwidth and so allowing more than 6 cores out of 8 >>> to run merely increases power consumption and may even slow down the
    computation. Chess is even more insidious in that certain pruning
    techniques don't lend themselves to parallelism so you lose both ways.

    However, running more processes of LTspice to execute different
    simulations
    at the same time should overcome this limitation: each simulation is
    distinct, they can be fully paralleled. If I run two simulations that
    individually would use the 20% of CPU and last 10 minutes, I should see a >>>> 40% CPU occupation but they still should take 10 minutes to complete.
    Maybe
    a little more for the Windows scheduler overhead.

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>>> both simulations would take almost exactly twice as much to complete, 20 >>>> minutes.

    The computation is almost certainly memory constrained. The matrix
    solver needs to have plenty of cache to solve the sparse equations and
    is likely making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell
    for the available resources. I expect LT Spice is very cache aware even
    if it is only single processor friendly.

    What about disk access? AFAIK an LTSpice instance by default saves its
    work to disk as it goes along, see e.g.

    <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>


    I have seen .save, limiting disk access, double sim speed. But then
    you can't freely probe the results, or calculate power dissipation,
    unless you plan that in advance.


    On this older i7 laptop that has two physical cores and two logical
    cores per, in LTSpice I tried setting thread priority to medium and max
    threads to two in each LTSpice instance to see if I could get them to load-share more evenly.

    And they seem to, CPU and disk utilization both go up, but the two sims
    still complete slower.

    At least on this machine for this test case just letting each instance
    take turns hogging everything for a while seems the optimal way to get
    it done

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Brown@21:1/5 to bitrex on Thu Sep 21 20:21:11 2023
    On 21/09/2023 19:04, bitrex wrote:
    On 9/21/2023 1:31 PM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but
    both simulations would take almost exactly twice as much to complete, 20 >>> minutes.

    The computation is almost certainly memory constrained. The matrix
    solver needs to have plenty of cache to solve the sparse equations and
    is likely making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell
    for the available resources. I expect LT Spice is very cache aware
    even if it is only single processor friendly.

    What about disk access? AFAIK an LTSpice instance by default saves its
    work to disk as it goes along, see e.g.

    <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>

    Quite likely it is also a factor and putting the machine on a UPS and
    using the more dangerous disk write caching strategy might speed it up.

    I'm assuming that anyone half serious about doing this will have the
    fastest possible SSD and on the fastest interface (which is very good
    when compared to spinning rust). You can gain almost another factor of
    two by having a matched RAID pair if your hardware supports it.

    But first you need to identify which bottleneck is the real problem and
    holding back performance. Doubling physical ram is fairly cheap.

    --
    Martin Brown

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Martin Brown on Thu Sep 21 12:44:35 2023
    On 9/21/2023 12:21 PM, Martin Brown wrote:
    On 21/09/2023 19:04, bitrex wrote:
    On 9/21/2023 1:31 PM, Martin Brown wrote:
    On 21/09/2023 13:22, dalai lamah wrote:

    Instead, what I'm seeing in reality is indeed a 40% CPU occupation, but >>>> both simulations would take almost exactly twice as much to complete, 20 >>>> minutes.

    The computation is almost certainly memory constrained. The matrix solver >>> needs to have plenty of cache to solve the sparse equations and is likely >>> making assumptions about cache lines remaining in cache.

    Two processes trying to do the same sort of thing will fight like hell for >>> the available resources. I expect LT Spice is very cache aware even if it is
    only single processor friendly.

    What about disk access? AFAIK an LTSpice instance by default saves its work >> to disk as it goes along, see e.g.

    <https://groups.google.com/g/sci.electronics.cad/c/EnqyB0hUSvo/m/QGxt1uTN1AkJ>

    Quite likely it is also a factor and putting the machine on a UPS and using the
    more dangerous disk write caching strategy might speed it up.

    I'm assuming that anyone half serious about doing this will have the fastest possible SSD and on the fastest interface (which is very good when compared to
    spinning rust). You can gain almost another factor of two by having a matched RAID pair if your hardware supports it.

    If the OP is only seeing 1-2MB/s on the disk, it's not the medium that's
    the problem (I can easily move 100MB/s on four spindles concurrently
    with "old hardware").

    If the application is foolishly flushing buffers all the time, then
    it's just wasting CPU cycles (are you afraid YOU are going to crash?
    a simulation can always be restarted so there's no "precious" data
    at stake)

    But first you need to identify which bottleneck is the real problem and holding
    back performance. Doubling physical ram is fairly cheap.

    It's a win in that it helps EVERYTHING on the machine.

    You can see the effect of having spice run alongside some other
    (e.g.) disk intensive application; does the disk app
    complete in the same time as it would "solo"? What impact
    does it have on the sim? (i.e., run apps that you know
    make specific types of demands on the hardware and see which
    "annoy" the sim)

    [Given that you can't really instrument anything beyond what's
    already available for inspection]

    But, I suspect it will prove to be exhausting the cache
    that is the real culprit.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)