• RX2800 sporadic disk I/O slowdowns

    From Richard Jordan@21:1/5 to All on Fri Oct 18 13:26:53 2024
    RX2800 i4 server, 64GB RAM, 4 processors, P410i controller with 10 each
    2TB disks in RAID 6, broken down into volumes.

    We periodically (sometimes steady once a week, but sometimes more
    frequent) one overnight batch job take much longer than normal to run.
    Normal runtime about 30-35 minutes will take 4.5 - 6.5 hours. Several
    images called by that job all run much slower than normal. At the end
    the overall CPU and I/O counts are very close between a normal and a
    long job.

    The data files are very large indexed files. Records are read and
    updated but not added in this job; output is just tabulated reports.

    We've run monitors for all and disk and also built polling snapshot jobs
    that check for locked/busy files, other active batch jobs, auto-checked
    through system analyzer looking for any other processes accessing the
    busy files at the same time as the problem batch (two data files show
    long busy periods but we do not show any other process with channels to
    that file at the same time except for backup, see next).

    The backups start at the same time, but do not get to the data disks
    until well after the problem job normally completes; that does cause
    concurrent access to the problem files but it occurs only when the job
    has already run long. so it is not the cause Overall backup time is
    about the same regardless of how long the problem batch takes.

    Monitor during a long run shows average and peak I/O rates to the disks
    with busy files at about 1/2 of what they do for normal runs. We can
    see that in the process snapshots too; the direct i/o count on a slow
    run increases much more slowly than on a normal run but both normal and
    long runs end up with close to the same CPU time and total I/Os.

    Other jobs in monitor are somewhat slowed down but nowhere near as much
    (and they do much less access).

    Before anyone asks, the indexed files could probably use a
    cleanup/rebuild, but if thats the cause would we see periodic
    performance issues? I would expect them to be constant.

    There is a backup server available, so I'm going to restore backups of
    the two problem files to it and do rebuilds to see how long it takes;
    that will determine how/when we can do it on the production server.



    So something is apparently causing it to be I/O constrained but so far
    we can't find it. Same concurrent processes, other jobs don't appear to
    be slowed down much (but may be much less i/o sensitive or using data
    on other disks, I threw that question to the devs).

    Is there anything in the background below VMS that could cause this?
    The controller doing drive checks or other maintenance activities?

    Thanks for any ideas.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Craig A. Berry@21:1/5 to Richard Jordan on Fri Oct 18 17:09:44 2024
    On 10/18/24 1:26 PM, Richard Jordan wrote:
    Monitor during a long run shows average and peak I/O rates to the disks
    with busy files at about 1/2 of what they do for normal runs.

    That is exactly what happens when the cache battery on a RAID controller
    dies. Maybe yours is half-dead and sometimes takes a charge and
    sometimes doesn't? MSA$UTIL should show the status of your P410.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Craig A. Berry on Sat Oct 19 00:07:21 2024
    On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:

    That is exactly what happens when the cache battery on a RAID controller dies.

    I hate hardware RAID. Has VMS still not got any equivalent to mdraid?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Fri Oct 18 20:39:33 2024
    On 10/18/2024 8:35 PM, Lawrence D'Oliveiro wrote:
    On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:
    On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:

    Has VMS still not got any equivalent to mdraid?

    VMS got volume shadowing in 1986 I believe.

    Relevance being?

    It is OS provided software RAID.

    Isn't that what you are asking for?

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Oct 19 00:56:51 2024
    On Fri, 18 Oct 2024 20:39:33 -0400, Arne Vajhøj wrote:

    On 10/18/2024 8:35 PM, Lawrence D'Oliveiro wrote:

    On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:

    On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:

    Has VMS still not got any equivalent to mdraid?

    VMS got volume shadowing in 1986 I believe.

    Relevance being?

    It is OS provided software RAID.

    Does “volume shadowing” mean just RAID 1?

    <https://manpages.debian.org/8/mdadm.8.en.html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Fri Oct 18 21:12:30 2024
    On 10/18/2024 8:56 PM, Lawrence D'Oliveiro wrote:
    On Fri, 18 Oct 2024 20:39:33 -0400, Arne Vajhøj wrote:
    On 10/18/2024 8:35 PM, Lawrence D'Oliveiro wrote:
    On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:
    On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
    Has VMS still not got any equivalent to mdraid?

    VMS got volume shadowing in 1986 I believe.

    Relevance being?

    It is OS provided software RAID.

    Does “volume shadowing” mean just RAID 1?

    I believe so.

    0, 5, 6 and 10 requires a RAID controller.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sat Oct 19 00:35:43 2024
    On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:

    On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:

    Has VMS still not got any equivalent to mdraid?

    VMS got volume shadowing in 1986 I believe.

    Relevance being?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Fri Oct 18 20:22:23 2024
    On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
    On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:
    That is exactly what happens when the cache battery on a RAID controller
    dies.

    I hate hardware RAID. Has VMS still not got any equivalent to mdraid?

    ????

    VMS got volume shadowing in 1986 I believe.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Volker Halle@21:1/5 to All on Sat Oct 19 09:02:57 2024
    Rich,

    this would be a perfect oppurtunity to run T4 - and look at the disk
    response times.

    Volker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Craig A. Berry on Sun Oct 20 01:04:06 2024
    On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:

    That is exactly what happens when the cache battery on a RAID controller dies.

    Here’s another question: why does a disk controller need a battery-backed-
    up cache? Or indeed any cache at all?

    Is this because it tells lies to the OS, saying that the data has been
    safely written to disk when in fact it hasn’t?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sun Oct 20 01:25:43 2024
    On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:

    On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:

    Here’s another question: why does a disk controller need a battery- backed-
    up cache? Or indeed any cache at all?

    Better performance.

    Think about it: the OS already has a filesystem cache in main RAM. That
    runs at main RAM speeds. Whereas the disk controller is connected through
    an interface to the CPU suitable only for disk I/O speeds. So any disk controller cache is on the wrong side of that interface.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sat Oct 19 21:22:08 2024
    On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:
    On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:
    That is exactly what happens when the cache battery on a RAID controller
    dies.

    Here’s another question: why does a disk controller need a battery-backed- up cache? Or indeed any cache at all?

    Better performance.

    Is this because it tells lies to the OS, saying that the data has been
    safely written to disk when in fact it hasn’t?

    If the battery is OK then it is reasonable safe as it will survive both
    system crash and power outage.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sat Oct 19 21:31:51 2024
    On 10/19/2024 9:25 PM, Lawrence D'Oliveiro wrote:
    On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:
    On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:
    Here’s another question: why does a disk controller need a battery-
    backed-
    up cache? Or indeed any cache at all?

    Better performance.

    Think about it: the OS already has a filesystem cache in main RAM. That
    runs at main RAM speeds. Whereas the disk controller is connected through
    an interface to the CPU suitable only for disk I/O speeds. So any disk controller cache is on the wrong side of that interface.

    That cache is toast if the system crashes. So applications
    that need to be sure data are written bypass that.

    If data loss with crash is acceptable then OS cache is fine.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sun Oct 20 01:48:48 2024
    On Sat, 19 Oct 2024 21:31:51 -0400, Arne Vajhøj wrote:

    On 10/19/2024 9:25 PM, Lawrence D'Oliveiro wrote:

    On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:

    On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:

    Here’s another question: why does a disk controller need a battery-
    backed-
    up cache? Or indeed any cache at all?

    Better performance.

    Think about it: the OS already has a filesystem cache in main RAM. That
    runs at main RAM speeds. Whereas the disk controller is connected
    through an interface to the CPU suitable only for disk I/O speeds. So
    any disk controller cache is on the wrong side of that interface.

    That cache is toast if the system crashes. So applications
    that need to be sure data are written bypass that.

    You can say the same for applications keeping data in their own RAM
    buffers. It’s a meaningless objection.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sat Oct 19 21:52:16 2024
    On 10/19/2024 9:48 PM, Lawrence D'Oliveiro wrote:
    On Sat, 19 Oct 2024 21:31:51 -0400, Arne Vajhøj wrote:

    On 10/19/2024 9:25 PM, Lawrence D'Oliveiro wrote:

    On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:

    On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:

    Here’s another question: why does a disk controller need a battery-
    backed-
    up cache? Or indeed any cache at all?

    Better performance.

    Think about it: the OS already has a filesystem cache in main RAM. That
    runs at main RAM speeds. Whereas the disk controller is connected
    through an interface to the CPU suitable only for disk I/O speeds. So
    any disk controller cache is on the wrong side of that interface.

    That cache is toast if the system crashes. So applications
    that need to be sure data are written bypass that.

    You can say the same for applications keeping data in their own RAM
    buffers. It’s a meaningless objection.

    The context is one where data loss is not acceptable. So data must be
    persisted so that they can survive system crash and power failure.

    File system cache and application cache are both no good in that case.

    So it is raid controller with battery backup cache or no cache.

    The first gives better performance than the second.

    Very meaningful.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sun Oct 20 03:51:32 2024
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:

    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Craig A. Berry@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 08:42:37 2024
    On 10/19/24 10:51 PM, Lawrence D'Oliveiro wrote:
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:

    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    Yes, there is. It's called battery-backed cache on a RAID controller,
    and when the power goes out, any writes in progress still get written
    because the battery supplies power to the cache.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 09:17:21 2024
    On 10/19/2024 11:51 PM, Lawrence D'Oliveiro wrote:
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    True, but that is not the issue.

    There may be a lot of stuff in progress that may get lost, but
    the point is that stuff that is complete and actions based on
    it being complete may have been taken cannot be lost. When the
    system comes up again then stuff in progress is like it never
    happened while the completed stuff must still be completed.

    Think transactions.

    begin
    # do a lot of stuff
    # if the system crashes here the system comes up like nothing was done
    commit
    # if the system crashes here the changes must be there

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to You on Sun Oct 20 21:16:40 2024
    On Sun, 20 Oct 2024 09:17:21 -0400, Arne Vajhøj wrote:

    On 10/19/2024 11:51 PM, Lawrence D'Oliveiro wrote:

    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:

    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    True, but that is not the issue.

    You said it was.

    Think transactions.

    Which is an entirely separate thing from data loss. This is about data integrity.

    And those caching disk controllers are useless for ensuring this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Craig A. Berry on Sun Oct 20 21:17:27 2024
    On Sun, 20 Oct 2024 08:42:37 -0500, Craig A. Berry wrote:

    On 10/19/24 10:51 PM, Lawrence D'Oliveiro wrote:

    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:

    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    Yes, there is. It's called battery-backed cache on a RAID controller,
    and when the power goes out, any writes in progress still get written
    because the battery supplies power to the cache.

    Unless the battery fails. Then you discover that your disk controller was
    lying to you about saving the data you thought it was saving.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 17:23:08 2024
    On 10/20/2024 5:16 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 09:17:21 -0400, Arne Vajhøj wrote:
    On 10/19/2024 11:51 PM, Lawrence D'Oliveiro wrote:
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There’s no way around that.

    True, but that is not the issue.

    You said it was.

    Loss of completed/committed data is the problem.

    Work in progress is difficult to avoid loosing - and
    it is not really desirable to have half done work
    saved.

    Think transactions.

    Which is an entirely separate thing from data loss.

    No.

    Remember what the D in ACID stands for.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Sun Oct 20 21:41:04 2024
    On Sun, 20 Oct 2024 17:23:08 -0400, Arne Vajhøj wrote:

    Loss of completed/committed data is the problem.

    The problem is not data loss, it is loss of data integrity. This is why we
    have transactions in databases and filesystems: on a crash or loss of
    power, we want transactions to be either completely lost or completely
    saved, not in some in-between incomplete state.

    There is no caching disk controller that knows how to ensure this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 19:08:41 2024
    On 10/20/2024 5:41 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 17:23:08 -0400, Arne Vajhøj wrote:
    Loss of completed/committed data is the problem.

    The problem is not data loss, it is loss of data integrity. This is why we have transactions in databases and filesystems: on a crash or loss of
    power, we want transactions to be either completely lost or completely
    saved, not in some in-between incomplete state.

    There is no caching disk controller that knows how to ensure this.

    Let me try again.

    DB write to plates & system crash => OK but slow
    DB write to OS cache & system crash => potential problem with transaction
    DB write to RAID controller with battery backup & system crash => OK and
    fast

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Oct 21 00:06:59 2024
    On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:

    Let me try again.

    DB write to plates & system crash => OK but slow

    The DB knows how to make this fast. Remember its cache is faster than any
    disk controller.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 20:13:50 2024
    On 10/20/2024 8:06 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:
    Let me try again.

    DB write to plates & system crash => OK but slow

    The DB knows how to make this fast. Remember its cache is faster than any disk controller.

    This is where the DB is writing to plates.

    You can add a fourth scenario:

    DB write to DB cache & system crash => guaranteed problem with transaction

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Oct 21 00:28:30 2024
    On Sun, 20 Oct 2024 20:13:50 -0400, Arne Vajhøj wrote:

    On 10/20/2024 8:06 PM, Lawrence D'Oliveiro wrote:

    On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:

    Let me try again.

    DB write to plates & system crash => OK but slow

    The DB knows how to make this fast. Remember its cache is faster than
    any disk controller.

    This is where the DB is writing to plates.

    You can add a fourth scenario:

    DB write to DB cache & system crash => guaranteed problem with
    transaction

    Transaction resilience is a standard thing with databases (and journalling filesystems) going back decades.

    Some DBMSes don’t even want to work through filesystems, they would rather manage the raw storage themselves. This is why POSIX async I/O exists <https://manpages.debian.org/7/aio.7.en.html>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 20:32:41 2024
    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 20:13:50 -0400, Arne Vajhøj wrote:
    On 10/20/2024 8:06 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:
    Let me try again.

    DB write to plates & system crash => OK but slow

    The DB knows how to make this fast. Remember its cache is faster than
    any disk controller.

    This is where the DB is writing to plates.

    You can add a fourth scenario:

    DB write to DB cache & system crash => guaranteed problem with
    transaction

    Transaction resilience is a standard thing with databases (and journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    To be sure to come up ok after a system crash it is either write to
    plates or write to a cache that will survive the system crash (raid
    controller cache with battery backup).

    Some DBMSes don’t even want to work through filesystems, they would rather manage the raw storage themselves. This is why POSIX async I/O exists <https://manpages.debian.org/7/aio.7.en.html>.

    Yes. That is to avoid any dangerous OS/filesystem cache (and possible
    for better performance).

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Oct 21 01:10:56 2024
    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:

    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:

    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    To be sure to come up ok after a system crash it is either write to
    plates or write to a cache that will survive the system crash (raid controller cache with battery backup).

    Unfortunately, that controller cache can’t guarantee any of these things:
    it can’t do miracles either, all it does is add another point of failure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Oct 21 01:28:43 2024
    On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhøj wrote:

    On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:

    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:

    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    Only if it can rely on a successful write not being lost.

    In other words, that the disk controller is not lying to you when it says
    a write has completed?

    To be sure to come up ok after a system crash it is either write to
    plates or write to a cache that will survive the system crash (raid
    controller cache with battery backup).

    Unfortunately, that controller cache can’t guarantee any of these
    things: it can’t do miracles either, all it does is add another point
    of failure.

    Yes - it can.

    It is not impacted by a system crash.

    Now you are really starting to sound like a believer in miracles ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 21:17:06 2024
    On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    Only if it can rely on a successful write not being lost.

    To be sure to come up ok after a system crash it is either write to
    plates or write to a cache that will survive the system crash (raid
    controller cache with battery backup).

    Unfortunately, that controller cache can’t guarantee any of these things: it can’t do miracles either, all it does is add another point of failure.

    Yes - it can.

    It is not impacted by a system crash.

    And with a power outage they have hours to get power back and get the
    data written (I believe 72 hours battery power is common).

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Lawrence D'Oliveiro on Sun Oct 20 21:32:05 2024
    On 10/20/2024 9:28 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhøj wrote:
    On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    Only if it can rely on a successful write not being lost.

    In other words, that the disk controller is not lying to you when it says
    a write has completed?

    Just that is is not lying when it says that it got it.

    To be sure to come up ok after a system crash it is either write to
    plates or write to a cache that will survive the system crash (raid
    controller cache with battery backup).

    Unfortunately, that controller cache can’t guarantee any of these
    things: it can’t do miracles either, all it does is add another point
    of failure.

    Yes - it can.

    It is not impacted by a system crash.

    Now you are really starting to sound like a believer in miracles ...

    A system crash and restart will blank RAM and wipe out all OS
    and filesystem caches - it will not impact the cache in the
    RAID controller.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Mon Oct 21 03:27:56 2024
    On Sun, 20 Oct 2024 21:32:05 -0400, Arne Vajhøj wrote:

    On 10/20/2024 9:28 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhøj wrote:
    On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    Only if it can rely on a successful write not being lost.

    In other words, that the disk controller is not lying to you when it says
    a write has completed?

    Just that is is not lying when it says that it got it.

    That’s not what it is saying. It is saying “write completed”.

    A system crash and restart will blank RAM and wipe out all OS
    and filesystem caches - it will not impact the cache in the
    RAID controller.

    You hope.

    You really are a believer in miracles, aren’t you?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Clubley@21:1/5 to Lawrence D'Oliveiro on Mon Oct 21 12:35:13 2024
    On 2024-10-19, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhj wrote:

    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There?s no way around that.

    Bollocks. It's called checkpointing and restart points for compute
    based operations. It's called transaction recovery for I/O based
    operations.

    Simon.

    --
    Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
    Walking destinations on a map are further away than they appear.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Clubley@21:1/5 to Lawrence D'Oliveiro on Mon Oct 21 12:44:47 2024
    On 2024-10-20, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sun, 20 Oct 2024 21:32:05 -0400, Arne Vajhj wrote:

    On 10/20/2024 9:28 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhj wrote:
    On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
    On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhj wrote:
    On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
    Transaction resilience is a standard thing with databases (and
    journalling filesystems) going back decades.

    Yes.

    But they can't do miracles.

    They can ensure, to a high degree of confidence, that the on-disk
    structure is consistent. That is to say, each transaction is either
    recorded as completed or not recorded at all, nothing in-between.

    Only if it can rely on a successful write not being lost.

    In other words, that the disk controller is not lying to you when it says >>> a write has completed?

    Just that is is not lying when it says that it got it.

    That?s not what it is saying. It is saying ?write completed?.

    A system crash and restart will blank RAM and wipe out all OS
    and filesystem caches - it will not impact the cache in the
    RAID controller.

    You hope.

    You really are a believer in miracles, aren?t you?

    No, he isn't. He does understand however that the stored data will
    be written to disk when power is restored and then any transaction
    or other recovery process can proceed from there.

    In addition, for data which is that important, you can run a fully
    shared cluster setup across multiple sites so that the loss of a server
    at one site does not really impact ongoing operations across the
    system as a whole.

    At this point Lawrence, I can't tell if you are just a troll who is
    just trying to provoke people or if you really believe what you are
    saying because you do not have a detailed understanding of how this
    stuff works.

    Simon.

    --
    Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
    Walking destinations on a map are further away than they appear.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to Craig A. Berry on Mon Oct 21 10:45:26 2024
    On 10/18/24 5:09 PM, Craig A. Berry wrote:

    On 10/18/24 1:26 PM, Richard Jordan wrote:
    Monitor during a long run shows average and peak I/O rates to the
    disks with busy files at about 1/2 of what they do for normal runs.

    That is exactly what happens when the cache battery on a RAID controller dies.  Maybe yours is half-dead and sometimes takes a charge and
    sometimes doesn't?  MSA$UTIL should show the status of your P410.

    Initial check shows all good on the controller. No disk, battery,
    cache, etc issues. I may add snapshotting the controller status via
    MSA$UTIL to one of the other polling jobs in case battery or cache
    status is varying.

    The long-run occurred again, another Monday in a row. I haven't had time
    to review our diagnostic polling yet to see if any different jobs or
    user stuff was running concurrently but will shortly.

    I'll take a look at T4; its been more than a decade since we tried using
    it on a customer system.

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@21:1/5 to Simon Clubley on Mon Oct 21 15:22:17 2024
    On 10/21/2024 8:35 AM, Simon Clubley wrote:
    On 2024-10-19, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
    The context is one where data loss is not acceptable.

    Data loss is unavoidable. If the power goes out, all computations in
    progress get lost. There?s no way around that.

    Bollocks. It's called checkpointing and restart points for compute
    based operations. It's called transaction recovery for I/O based
    operations.

    I read his comments as being about what happens between
    checkpoints/commits.

    It is not possible to recover that. But it is not
    desirable to recover that, because that would be
    an inconsistent state.

    begin
    update accounts set amount = amount - ? where id = ?
    here <<<<
    update accounts set amount = amount + ? where id = ?
    commit

    or:

    checkpoint m
    m.accounts[id_a] -= xframt
    here <<<<
    m.accounts[id_b] += xframt
    checkpoint m

    is really the same.

    Arne

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dan Cross@21:1/5 to clubley@remove_me.eisner.decus.org- on Mon Oct 21 21:42:26 2024
    In article <vf5ibv$uet0$5@dont-email.me>,
    Simon Clubley <clubley@remove_me.eisner.decus.org-Earth.UFP> wrote:
    [snip]
    At this point Lawrence, I can't tell if you are just a troll who is
    just trying to provoke people or if you really believe what you are
    saying because you do not have a detailed understanding of how this
    stuff works.

    Why not both?

    - Dan C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Craig A. Berry@21:1/5 to Richard Jordan on Mon Oct 21 17:43:00 2024
    On 10/21/24 10:45 AM, Richard Jordan wrote:
    On 10/18/24 5:09 PM, Craig A. Berry wrote:

    On 10/18/24 1:26 PM, Richard Jordan wrote:
    Monitor during a long run shows average and peak I/O rates to the
    disks with busy files at about 1/2 of what they do for normal runs.

    That is exactly what happens when the cache battery on a RAID controller
    dies.  Maybe yours is half-dead and sometimes takes a charge and
    sometimes doesn't?  MSA$UTIL should show the status of your P410.

    Initial check shows all good on the controller.  No disk, battery,
    cache, etc issues.  I may add snapshotting the controller status via MSA$UTIL to one of the other polling jobs in case battery or cache
    status is varying.

    Yeah, it would be good to gather whatever info you can *while* the poor performance is happening. It could be a lot of things, but with a
    device that's designed to recover from or compensate for errors, looking
    at after it's recovered from whatever is bothering it may not help much.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to All on Mon Nov 4 17:03:36 2024
    Followup on this. I'm looking at one of Hein's presentations on RMS
    indexed files, tuning, etc.

    Presuming the system has plenty of memory and per autogen its state of
    tune is pretty close to what autogen wants, is there any downside to
    setting a count of global buffers on the large indexed data files
    involved in this issue (the ones that show extended 'busy' channels in
    system analyzer)? Can it cause any problems that would impact production?

    We already tested setting a modest process RMS buffer count for indexed
    files on the accounts used for batch operations, and that seems to make
    a modest improvement in runtime and a significant reduction in direct
    I/Os. Saved 3-4 minutes on a 32-34 minute runtime but DIOs dropped from
    ~5.1 million to ~4.3 million.

    Unfortunately we still had two jobs run long, one over 7 hours so they
    killed it, the other about 4.5 hours but with the same reduced 4.3M DIO
    count. So it helped in general but did not make a difference to the
    problem. I don't expect the global buffers to fix the problem either
    but its worth testing for performance reasons.

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From abrsvc@21:1/5 to All on Tue Nov 5 12:22:31 2024
    Note: Global buffers can be an advantage, but are not used when dealing
    with duplicate secondary keys. Those are handled in local buffers. I
    have seen drastic differences in performance when changing bucket sizes
    more with secondary keys that have many duplicates than with primary
    keyed access. Hein has some tools that analyze the statistics of
    indexed files that report the number of I/Os per operation. High values
    here can indicate inefficient use of buckets or buckets that are too
    small forcing the use of more I/Os to retrieve buckets. Increasing the
    bucket size can significantly reduce I/Os resulting in better overall
    stats.

    This won't directly address the reported slowdown, but might be a
    trigger for it depending upon data locality.

    Dan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Volker Halle@21:1/5 to All on Tue Nov 5 17:32:02 2024
    Am 18.10.2024 um 20:26 schrieb Richard Jordan:

    We periodically (sometimes steady once a week, but sometimes more
    frequent) one overnight batch job take much longer than normal to run.
    Normal runtime about 30-35 minutes will take 4.5 - 6.5 hours.  Several images called by that job all run much slower than normal.  At the end
    the overall CPU and I/O counts are very close between a normal and a
    long job.

    If 'overall CPU and I/O counts' are about the same, please re-consider
    my advice to run T4. Look at the disk response times and I/O queue
    length and compare a 'good' and a 'slow' run.

    If 'the problem' would be somewhere in the disk IO sub-system, changing
    RMS buffers will only 'muddy the waters'.

    Volker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to abrsvc on Tue Nov 5 13:59:30 2024
    On 11/5/24 6:22 AM, abrsvc wrote:
    Note:  Global buffers can be an advantage, but are not used when dealing with duplicate secondary keys.  Those are handled in local buffers. I
    have seen drastic differences in performance when changing bucket sizes
    more with secondary keys that have many duplicates than with primary
    keyed access.  Hein has some tools that analyze the statistics of
    indexed files that report the number of I/Os per operation. High values
    here can indicate inefficient use of buckets or buckets that are too
    small forcing the use of more I/Os to retrieve buckets.  Increasing the bucket size can significantly reduce I/Os resulting in better overall
    stats.

    This won't directly address the reported slowdown, but might be a
    trigger for it depending upon data locality.

    Dan

    Dan,
    Apparently the name of Hein's tools changed and I just found the
    one referred to in the presentation. Will try it on backup copies of
    the file (on the backup server) and see what it says.

    We tested doing a plain convert on all of the files involved in
    this situation on the backup server, and that task may be doable one
    file per weekend, but if the tuning apps require changes that mean doing
    an unload/reload of the file, going to have t find out how long that
    takes; backup windows are tight and except for rare VMS upgrade days (or
    when we moved from the RX3600 to these new servers), downtime is very
    hard to get.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Jordan@21:1/5 to Volker Halle on Tue Nov 5 13:51:20 2024
    On 11/5/24 10:32 AM, Volker Halle wrote:
    Am 18.10.2024 um 20:26 schrieb Richard Jordan:

    We periodically (sometimes steady once a week, but sometimes more
    frequent) one overnight batch job take much longer than normal to run.
    Normal runtime about 30-35 minutes will take 4.5 - 6.5 hours.  Several
    images called by that job all run much slower than normal.  At the end
    the overall CPU and I/O counts are very close between a normal and a
    long job.

    If 'overall CPU and I/O counts' are about the same, please re-consider
    my advice to run T4. Look at the disk response times and I/O queue
    length and compare a 'good' and a 'slow' run.

    If 'the problem' would be somewhere in the disk IO sub-system, changing
    RMS buffers will only 'muddy the waters'.

    Volker.

    Volker,
    we are getting T4 running on the backup server to re-learn it; its
    been more than 10 years since we played with it on another box.

    I have monitor running and have been checking the I/O rates and
    queue lengths during the 30+ minute runs and the multi-hour runs, and
    the only diffs there are the overall I/O rates to the two disks are much
    lower on the long runs than on the normal short ones.

    But we'll try T4 and see what it shows once I'm happy with it on
    the backup server.

    This stuff is interfering with getting the 8.4-2L3 testing done so
    we can upgrade the production server asap.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Volker Halle@21:1/5 to All on Thu Nov 7 11:15:48 2024
    Am 05.11.2024 um 20:51 schrieb Richard Jordan:
    On 11/5/24 10:32 AM, Volker Halle wrote:
    Am 18.10.2024 um 20:26 schrieb Richard Jordan:
    ...
         I have monitor running and have been checking the I/O rates and queue lengths during the 30+ minute runs and the multi-hour runs, and
    the only diffs there are the overall I/O rates to the two disks are much lower on the long runs than on the normal short ones.

    Rich,

    did you consider running some disk-IO benchmarking tool ? On the two
    disks sometimes affected by the problem ? And on other disks on this
    RAID controller ?

    This could provide some baseline achievable I/O rates and response
    times. You could then run those test while 'the problem' exists and
    during the 'short runs'.

    If you also see the problem with a standard disk-IO benchmark,
    considerations about local/global buffers may be less important.

    There is the DISKBLOCK tool on the Freeware CDs, but I also have a more
    current version on: https://eisner.decuserve.org/~halle/#diskblock

    DISKBLOCK has a 'TEST' command to performance disk performance testing (read-only and/or read-write).

    Volker.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)