Monitor during a long run shows average and peak I/O rates to the disks
with busy files at about 1/2 of what they do for normal runs.
That is exactly what happens when the cache battery on a RAID controller dies.
On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:
On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
Has VMS still not got any equivalent to mdraid?
VMS got volume shadowing in 1986 I believe.
Relevance being?
On 10/18/2024 8:35 PM, Lawrence D'Oliveiro wrote:
On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:
On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
Has VMS still not got any equivalent to mdraid?
VMS got volume shadowing in 1986 I believe.
Relevance being?
It is OS provided software RAID.
On Fri, 18 Oct 2024 20:39:33 -0400, Arne Vajhøj wrote:
On 10/18/2024 8:35 PM, Lawrence D'Oliveiro wrote:
On Fri, 18 Oct 2024 20:22:23 -0400, Arne Vajhøj wrote:
On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
Has VMS still not got any equivalent to mdraid?
VMS got volume shadowing in 1986 I believe.
Relevance being?
It is OS provided software RAID.
Does “volume shadowing” mean just RAID 1?
On 10/18/2024 8:07 PM, Lawrence D'Oliveiro wrote:
Has VMS still not got any equivalent to mdraid?
VMS got volume shadowing in 1986 I believe.
On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:
That is exactly what happens when the cache battery on a RAID controller
dies.
I hate hardware RAID. Has VMS still not got any equivalent to mdraid?
That is exactly what happens when the cache battery on a RAID controller dies.
On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:
Here’s another question: why does a disk controller need a battery- backed-
up cache? Or indeed any cache at all?
Better performance.
On Fri, 18 Oct 2024 17:09:44 -0500, Craig A. Berry wrote:
That is exactly what happens when the cache battery on a RAID controller
dies.
Here’s another question: why does a disk controller need a battery-backed- up cache? Or indeed any cache at all?
Is this because it tells lies to the OS, saying that the data has been
safely written to disk when in fact it hasn’t?
On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:
On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:backed-
Here’s another question: why does a disk controller need a battery-
up cache? Or indeed any cache at all?
Better performance.
Think about it: the OS already has a filesystem cache in main RAM. That
runs at main RAM speeds. Whereas the disk controller is connected through
an interface to the CPU suitable only for disk I/O speeds. So any disk controller cache is on the wrong side of that interface.
On 10/19/2024 9:25 PM, Lawrence D'Oliveiro wrote:
On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:
backed-
On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:
Here’s another question: why does a disk controller need a battery-
up cache? Or indeed any cache at all?
Better performance.
Think about it: the OS already has a filesystem cache in main RAM. That
runs at main RAM speeds. Whereas the disk controller is connected
through an interface to the CPU suitable only for disk I/O speeds. So
any disk controller cache is on the wrong side of that interface.
That cache is toast if the system crashes. So applications
that need to be sure data are written bypass that.
On Sat, 19 Oct 2024 21:31:51 -0400, Arne Vajhøj wrote:
On 10/19/2024 9:25 PM, Lawrence D'Oliveiro wrote:
On Sat, 19 Oct 2024 21:22:08 -0400, Arne Vajhøj wrote:
backed-
On 10/19/2024 9:04 PM, Lawrence D'Oliveiro wrote:
Here’s another question: why does a disk controller need a battery-
up cache? Or indeed any cache at all?
Better performance.
Think about it: the OS already has a filesystem cache in main RAM. That
runs at main RAM speeds. Whereas the disk controller is connected
through an interface to the CPU suitable only for disk I/O speeds. So
any disk controller cache is on the wrong side of that interface.
That cache is toast if the system crashes. So applications
that need to be sure data are written bypass that.
You can say the same for applications keeping data in their own RAM
buffers. It’s a meaningless objection.
The context is one where data loss is not acceptable.
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There’s no way around that.
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There’s no way around that.
On 10/19/2024 11:51 PM, Lawrence D'Oliveiro wrote:
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There’s no way around that.
True, but that is not the issue.
Think transactions.
On 10/19/24 10:51 PM, Lawrence D'Oliveiro wrote:
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There’s no way around that.
Yes, there is. It's called battery-backed cache on a RAID controller,
and when the power goes out, any writes in progress still get written
because the battery supplies power to the cache.
On Sun, 20 Oct 2024 09:17:21 -0400, Arne Vajhøj wrote:
On 10/19/2024 11:51 PM, Lawrence D'Oliveiro wrote:
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There’s no way around that.
True, but that is not the issue.
You said it was.
Think transactions.
Which is an entirely separate thing from data loss.
Loss of completed/committed data is the problem.
On Sun, 20 Oct 2024 17:23:08 -0400, Arne Vajhøj wrote:
Loss of completed/committed data is the problem.
The problem is not data loss, it is loss of data integrity. This is why we have transactions in databases and filesystems: on a crash or loss of
power, we want transactions to be either completely lost or completely
saved, not in some in-between incomplete state.
There is no caching disk controller that knows how to ensure this.
Let me try again.
DB write to plates & system crash => OK but slow
On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:
Let me try again.
DB write to plates & system crash => OK but slow
The DB knows how to make this fast. Remember its cache is faster than any disk controller.
On 10/20/2024 8:06 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:
Let me try again.
DB write to plates & system crash => OK but slow
The DB knows how to make this fast. Remember its cache is faster than
any disk controller.
This is where the DB is writing to plates.
You can add a fourth scenario:
DB write to DB cache & system crash => guaranteed problem with
transaction
On Sun, 20 Oct 2024 20:13:50 -0400, Arne Vajhøj wrote:
On 10/20/2024 8:06 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 19:08:41 -0400, Arne Vajhøj wrote:
Let me try again.
DB write to plates & system crash => OK but slow
The DB knows how to make this fast. Remember its cache is faster than
any disk controller.
This is where the DB is writing to plates.
You can add a fourth scenario:
DB write to DB cache & system crash => guaranteed problem with
transaction
Transaction resilience is a standard thing with databases (and journalling filesystems) going back decades.
Some DBMSes don’t even want to work through filesystems, they would rather manage the raw storage themselves. This is why POSIX async I/O exists <https://manpages.debian.org/7/aio.7.en.html>.
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
To be sure to come up ok after a system crash it is either write to
plates or write to a cache that will survive the system crash (raid controller cache with battery backup).
On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
They can ensure, to a high degree of confidence, that the on-disk
structure is consistent. That is to say, each transaction is either
recorded as completed or not recorded at all, nothing in-between.
Only if it can rely on a successful write not being lost.
To be sure to come up ok after a system crash it is either write to
plates or write to a cache that will survive the system crash (raid
controller cache with battery backup).
Unfortunately, that controller cache can’t guarantee any of these
things: it can’t do miracles either, all it does is add another point
of failure.
Yes - it can.
It is not impacted by a system crash.
On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
They can ensure, to a high degree of confidence, that the on-disk
structure is consistent. That is to say, each transaction is either
recorded as completed or not recorded at all, nothing in-between.
To be sure to come up ok after a system crash it is either write to
plates or write to a cache that will survive the system crash (raid
controller cache with battery backup).
Unfortunately, that controller cache can’t guarantee any of these things: it can’t do miracles either, all it does is add another point of failure.
On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhøj wrote:
On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
They can ensure, to a high degree of confidence, that the on-disk
structure is consistent. That is to say, each transaction is either
recorded as completed or not recorded at all, nothing in-between.
Only if it can rely on a successful write not being lost.
In other words, that the disk controller is not lying to you when it says
a write has completed?
To be sure to come up ok after a system crash it is either write to
plates or write to a cache that will survive the system crash (raid
controller cache with battery backup).
Unfortunately, that controller cache can’t guarantee any of these
things: it can’t do miracles either, all it does is add another point
of failure.
Yes - it can.
It is not impacted by a system crash.
Now you are really starting to sound like a believer in miracles ...
On 10/20/2024 9:28 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhøj wrote:
On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhøj wrote:
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
They can ensure, to a high degree of confidence, that the on-disk
structure is consistent. That is to say, each transaction is either
recorded as completed or not recorded at all, nothing in-between.
Only if it can rely on a successful write not being lost.
In other words, that the disk controller is not lying to you when it says
a write has completed?
Just that is is not lying when it says that it got it.
A system crash and restart will blank RAM and wipe out all OS
and filesystem caches - it will not impact the cache in the
RAID controller.
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There?s no way around that.
On Sun, 20 Oct 2024 21:32:05 -0400, Arne Vajhj wrote:
On 10/20/2024 9:28 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 21:17:06 -0400, Arne Vajhj wrote:
On 10/20/2024 9:10 PM, Lawrence D'Oliveiro wrote:
On Sun, 20 Oct 2024 20:32:41 -0400, Arne Vajhj wrote:
On 10/20/2024 8:28 PM, Lawrence D'Oliveiro wrote:
Transaction resilience is a standard thing with databases (and
journalling filesystems) going back decades.
Yes.
But they can't do miracles.
They can ensure, to a high degree of confidence, that the on-disk
structure is consistent. That is to say, each transaction is either
recorded as completed or not recorded at all, nothing in-between.
Only if it can rely on a successful write not being lost.
In other words, that the disk controller is not lying to you when it says >>> a write has completed?
Just that is is not lying when it says that it got it.
That?s not what it is saying. It is saying ?write completed?.
A system crash and restart will blank RAM and wipe out all OS
and filesystem caches - it will not impact the cache in the
RAID controller.
You hope.
You really are a believer in miracles, aren?t you?
On 10/18/24 1:26 PM, Richard Jordan wrote:
Monitor during a long run shows average and peak I/O rates to the
disks with busy files at about 1/2 of what they do for normal runs.
That is exactly what happens when the cache battery on a RAID controller dies. Maybe yours is half-dead and sometimes takes a charge and
sometimes doesn't? MSA$UTIL should show the status of your P410.
On 2024-10-19, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Sat, 19 Oct 2024 21:52:16 -0400, Arne Vajhøj wrote:
The context is one where data loss is not acceptable.
Data loss is unavoidable. If the power goes out, all computations in
progress get lost. There?s no way around that.
Bollocks. It's called checkpointing and restart points for compute
based operations. It's called transaction recovery for I/O based
operations.
update accounts set amount = amount + ? where id = ?here <<<<
m.accounts[id_b] += xframthere <<<<
[snip]
At this point Lawrence, I can't tell if you are just a troll who is
just trying to provoke people or if you really believe what you are
saying because you do not have a detailed understanding of how this
stuff works.
On 10/18/24 5:09 PM, Craig A. Berry wrote:
On 10/18/24 1:26 PM, Richard Jordan wrote:
Monitor during a long run shows average and peak I/O rates to the
disks with busy files at about 1/2 of what they do for normal runs.
That is exactly what happens when the cache battery on a RAID controller
dies. Maybe yours is half-dead and sometimes takes a charge and
sometimes doesn't? MSA$UTIL should show the status of your P410.
Initial check shows all good on the controller. No disk, battery,
cache, etc issues. I may add snapshotting the controller status via MSA$UTIL to one of the other polling jobs in case battery or cache
status is varying.
We periodically (sometimes steady once a week, but sometimes more
frequent) one overnight batch job take much longer than normal to run.
Normal runtime about 30-35 minutes will take 4.5 - 6.5 hours. Several images called by that job all run much slower than normal. At the end
the overall CPU and I/O counts are very close between a normal and a
long job.
Note: Global buffers can be an advantage, but are not used when dealing with duplicate secondary keys. Those are handled in local buffers. I
have seen drastic differences in performance when changing bucket sizes
more with secondary keys that have many duplicates than with primary
keyed access. Hein has some tools that analyze the statistics of
indexed files that report the number of I/Os per operation. High values
here can indicate inefficient use of buckets or buckets that are too
small forcing the use of more I/Os to retrieve buckets. Increasing the bucket size can significantly reduce I/Os resulting in better overall
stats.
This won't directly address the reported slowdown, but might be a
trigger for it depending upon data locality.
Dan
Am 18.10.2024 um 20:26 schrieb Richard Jordan:
We periodically (sometimes steady once a week, but sometimes more
frequent) one overnight batch job take much longer than normal to run.
Normal runtime about 30-35 minutes will take 4.5 - 6.5 hours. Several
images called by that job all run much slower than normal. At the end
the overall CPU and I/O counts are very close between a normal and a
long job.
If 'overall CPU and I/O counts' are about the same, please re-consider
my advice to run T4. Look at the disk response times and I/O queue
length and compare a 'good' and a 'slow' run.
If 'the problem' would be somewhere in the disk IO sub-system, changing
RMS buffers will only 'muddy the waters'.
Volker.
On 11/5/24 10:32 AM, Volker Halle wrote:
Am 18.10.2024 um 20:26 schrieb Richard Jordan:...
I have monitor running and have been checking the I/O rates and queue lengths during the 30+ minute runs and the multi-hour runs, and
the only diffs there are the overall I/O rates to the two disks are much lower on the long runs than on the normal short ones.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 437 |
Nodes: | 16 (2 / 14) |
Uptime: | 193:50:57 |
Calls: | 9,135 |
Calls today: | 2 |
Files: | 13,432 |
Messages: | 6,035,425 |