Now I'll try and work out where dumpfiles go - I did turn on
SYSTEM_CHECK and write it to CURRENT so hopefully it's there somewhere.
Another idea: If this really is an executive mode failure, I wonder if setting BUGCHECKFATAL to 1 would be useful here ?
Alex: What this would do is to turn the failure into a failure that
crashes the system (and writes a dumpfile, assuming the VSI virtual
image is setup correctly) instead of just deleting the current process.
It would also mean anything in memory (including command history, etc)
would be written to the dumpfile, so make sure there's nothing private
in the memory of your system before performing more tests.
What you could do then is to compress the dumpfile and send it to VSI privately via some means they give you.
BTW, another idea: does x86-64 VMS currently write an entry into the
errorlog on an executive mode bugcheck ? I wonder if it would be useful
to check the errorlog to see if there is anything useful there from the previous failure ?
Simon.
PS: pointed message to VSI management: A hobbyist has just appeared to
find a process-deleting bug within VMS that your testing has missed so far. _This_ is an example of why the hobbyist program is important, and of benefit, to you.
I consider it more likely that VMS and the CPU/virtual memory
environment your VM provide disagree on something causing
random sporadic memory related errors.
Arne
$ help analyze
Improperly handled condition, bad stack or no handler specified.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000004
00000000000007D8
FFFF830007C02367
0000000000000012
Register dump:
RAX = 000000007FF9DC80 RDI = 000000007FF9DC80 RSI = 00000000000007D8
RDX = 0000000000000000 RCX = 00000000000007D8 R8 = 00000000FFFF8F84
R9 = 000000000808080D RBX = 000000007FFABE00 RBP = 000000007FF9E4A0
R10 = 000000007FFABDB0 R11 = 000000007FFA4D18 R12 = 000000007FF9C648
R13 = 0000000000000018 R14 = 000000007FF9C800 R15 = 0000000000008301
RIP = FFFF830007C02367 RSP = 000000007FF9E440 SS = 000000000000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000000C, PC=0000000000000002, PS=7AD44D2F
Improperly handled condition, bad stack or no handler specified.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000014
0000000000000000
0000000000000000
0000000000000012
Register dump:
RAX = 0000000000000001 RDI = FFFFFFFF776521E8 RSI = 00000000020C0001
RDX = 0000000000000000 RCX = FFFFFFFF8AC09B5E R8 = 000000007ACBD11F
R9 = 0000000004000106 RBX = 000000007FFABE00 RBP = 000000007FF9D568
R10 = 000000007FFA4D18 R11 = 000000007FFA4D18 R12 = 000000007FF9D520
R13 = 000000007FFCDCAC R14 = 0000000000000002 R15 = 000000004B9E8301
Connection to 192.168.188.121 closed.000007FF9CA30 SS = 000000000000001B
On 4/6/24 19:50, motk wrote:
Everything is going great.
$ help analyze
Improperly handled condition, bad stack or no handler specified.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000004
00000000000007D8
FFFF830007C02367
0000000000000012
Register dump:
RAX = 000000007FF9DC80 RDI = 000000007FF9DC80 RSI = 00000000000007D8
RDX = 0000000000000000 RCX = 00000000000007D8 R8 = 00000000FFFF8F84
R9 = 000000000808080D RBX = 000000007FFABE00 RBP = 000000007FF9E4A0
R10 = 000000007FFABDB0 R11 = 000000007FFA4D18 R12 = 000000007FF9C648
R13 = 0000000000000018 R14 = 000000007FF9C800 R15 = 0000000000008301
RIP = FFFF830007C02367 RSP = 000000007FF9E440 SS = 000000000000001B %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000000C, PC=0000000000000002, PS=7AD44D2F
On 4/8/24 12:20, Arne Vajhj wrote:
I consider it more likely that VMS and the CPU/virtual memory
environment your VM provide disagree on something causing
random sporadic memory related errors.
It seems odd, I agree. This was running on an intel nuc with a 12th gen
i5 cpu, and I wonder if openvms doesn't like straddling P/E cores.
I've previously done some memory burn-in on that node without issues.
I've migrated it over to a plain 6th gen node with 4 boring cores; lets
see if that improves things.
On 6/04/2024 7:39 pm, motk wrote:
Now I'll try and work out where dumpfiles go - I did turn on
SYSTEM_CHECK and write it to CURRENT so hopefully it's there somewhere.
Yeah, nah, nothing in SYS$SYSTEM:SYSDUMP.DMP.
On 4/8/24 12:20, Arne Vajhøj wrote:
I consider it more likely that VMS and the CPU/virtual memory
environment your VM provide disagree on something causing
random sporadic memory related errors.
It seems odd, I agree. This was running on an intel nuc with a 12th gen
i5 cpu, and I wonder if openvms doesn't like straddling P/E cores.
I've previously done some memory burn-in on that node without issues.
I've migrated it over to a plain 6th gen node with 4 boring cores; lets
see if that improves things.
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue booting with the reason why been made clear. The bug in this case is that this check is missing from VMS.
The other possibility is that VMS is _supposed_ to work OK in this configuration, but this specific VM setup has been untested by VSI until
now. That means there is a bug in the VMS code itself which needs fixing.
On 2024-04-08, Robert A. Brooks <FIRST.LAST@vmssoftware.com> wrote:
On 4/8/2024 8:34 AM, Simon Clubley wrote:
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue >>> booting with the reason why been made clear. The bug in this case is that >>> this check is missing from VMS.
That's one way to look at it.
Another way is that we have been quite clear what the requirements are to run VMS.
Any variation from that is unsupported. We recognize that there are likely configurations
that are technically unsupported, but will still likely work. Preventing those
configurations from working is someething we could do, but chose not to.
Given that the VMS mindset is supposed to be one of robustness and reliability, perhaps the proper approach is to enforce a default refuse
to boot on unsupposed configuration, but allow an override with a boot
flag or SYSGEN parameter.
That way, people don't accidentally use an unsupported configuration in production use, but you also don't stop people from using an unsupported configuration if they choose to do so.
However, if you implement this, an impossible to miss message should be output on every boot so that the flag is not set and then forgotten about.
On 4/8/2024 8:34 AM, Simon Clubley wrote:
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue
booting with the reason why been made clear. The bug in this case is that
this check is missing from VMS.
That's one way to look at it.
Another way is that we have been quite clear what the requirements are to run VMS.
Any variation from that is unsupported. We recognize that there are likely configurations
that are technically unsupported, but will still likely work. Preventing those
configurations from working is someething we could do, but chose not to.
I don't think 6th gen is supported is it? In any case, check your
host(s) with the Python script here:
https://vmssoftware.com/openkits/alpopensource/vmscheck.zip
the current mode bits are in the same place as on Alpha.)
I wonder if RMS (or the XQP) has managed to corrupt your disk somehow.
Can you make the system disk available to a second instance and run an
$ anal/disk on it from that second instance ?
We're in an age of commodity COTS hardware now.
Remember that x86 is not exactly an “open standard”--it is very much controlled by proprietary vendors who never cease their search for that vendor-lock-in edge.
Just because you have been pampered by the ongoing efforts of OS
developers in the Linux community and elsewhere to ensure that things
“just work” doesn’t mean it’s something you can take for granted.
On 4/8/2024 8:34 AM, Simon Clubley wrote:
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue
booting with the reason why been made clear. The bug in this case is that
this check is missing from VMS.
That's one way to look at it.
Another way is that we have been quite clear what the requirements are
to run VMS.
Any variation from that is unsupported. We recognize that there are
likely configurations
that are technically unsupported, but will still likely work.
Preventing those
configurations from working is someething we could do, but chose not to.
The other possibility is that VMS is _supposed_ to work OK in this
configuration, but this specific VM setup has been untested by VSI until
now. That means there is a bug in the VMS code itself which needs fixing.
We are not claiming support for Proxmox, although that testing has begun. Given that it is a KVM-based hypervisor, getting it fully supported
should not
be difficult, but we're not there yet.
It is for these reasons that we've been quite conservative about what is supported.
We are interested in any feedback we get, but that doesn't mean we're
going to respond to every
problem immediately when it's an unsupported configuration.
Because removing that test would require a release.
You don’t have the concept of “volatile” package updates? Like the timezone database, which changes several times a year?
We are not claiming support for Proxmox, although that testing has begun. Given that it is a KVM-based hypervisor, getting it fully supported
should not
be difficult, but we're not there yet.
But I think it would be very problematic with VMS complaining
over configs that are not known to work.
Because removing that test would require a release.
We would see:
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
..
No thanks.
On 9/4/24 02:49, Robert A. Brooks wrote:
We are not claiming support for Proxmox, although that testing has begun.
Given that it is a KVM-based hypervisor, getting it fully supported
should not
be difficult, but we're not there yet.
It's basically vanilla kvm-qemu. People aren't trying to run it in
nested emulators on an FPGA or anything.
On 2024-04-08, Arne Vajhøj <arne@vajhoej.dk> wrote:
But I think it would be very problematic with VMS complaining
over configs that are not known to work.
Because removing that test would require a release.
We would see:
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
..
No thanks.
It does not have to be a release - it could be a patch.
It is also absolutely no different from in the past when a new version of VMS
used to add support for new CPUs from DEC.
IOW, my suggested approach is a very long-established part of the
VMS world. The only difference now is that VMS would be allowed to
continue booting if you set an override flag or SYSGEN parameter.
Also, there should be no need to add support for "VM Bar 4" unless
it brought new functionality over "VM Bar 3" that you wanted to
support in VMS.
VMS is used in mission-critical production environments. You should
not be allowed to accidentally boot into an unsupported configuration
without being made _VERY_ aware of that fact.
On 2024-04-08, Arne Vajhøj <arne@vajhoej.dk> wrote:
But I think it would be very problematic with VMS complaining
over configs that are not known to work.
Because removing that test would require a release.
We would see:
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
..
No thanks.
It does not have to be a release - it could be a patch. It is also
absolutely no different from in the past when a new version of VMS
used to add support for new CPUs from DEC.
IOW, my suggested approach is a very long-established part of the
VMS world. The only difference now is that VMS would be allowed to
continue booting if you set an override flag or SYSGEN parameter.
Also, there should be no need to add support for "VM Bar 4" unless
it brought new functionality over "VM Bar 3" that you wanted to
support in VMS.
VMS is used in mission-critical production environments. You should
not be allowed to accidentally boot into an unsupported configuration
without being made _VERY_ aware of that fact.
Simon.
On 9/4/24 11:43, Lawrence D'Oliveiro wrote:
You don’t have the concept of “volatile” package updates? Like the
timezone database, which changes several times a year?
It sounds like it's going to be a yearly build and then thrown over the fence?
Surely not.
In the interest of disapproving the use of assumptions (ASS U ME), I'd
guess nobody yet knows what is going to happen. Perhaps even VSI!
We would see:>
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
No thanks.
Arne
On 4/9/2024 8:45 AM, Simon Clubley wrote:
On 2024-04-08, Arne Vajhj <arne@vajhoej.dk> wrote:
But I think it would be very problematic with VMS complaining
over configs that are not known to work.
Because removing that test would require a release.
We would see:
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
..
No thanks.
It does not have to be a release - it could be a patch.
True.
But I don't like:
...
VMS 9.2-2 with HW patch 41
VMS 9.2-2 with HW patch 42
...
either.
IOW, my suggested approach is a very long-established part of the
VMS world. The only difference now is that VMS would be allowed to
continue booting if you set an override flag or SYSGEN parameter.
Also, there should be no need to add support for "VM Bar 4" unless
it brought new functionality over "VM Bar 3" that you wanted to
support in VMS.
????
The interest in different VM's is not driven by what VMS need,
but from what customers want.
VMS is used in mission-critical production environments. You should
not be allowed to accidentally boot into an unsupported configuration
without being made _VERY_ aware of that fact.
Hopefully those running a mission critical production environment
on VMS read about supported configs before moving production to
that config and never runs it in anything accidentally
booted.
But hasn't the discussion been about the CL stuff? I don't think CL and mission
critical co-exist. I'm sure VSI doesn't think that.
As for due diligence, when did that go away? Any reasonable customer would check, and re-check, that they are using supported stuff.
On 9/4/24 11:00, Arne Vajhj wrote:
We would see:>
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
Just looking at the supported hardware/virtualization environments on https://vmssoftware.com/about/v92/ and noting a couple of things:
There's no actual 'supported' list, only 'tested'.
The latest tested QEMU version is 5.2.0, from 2020.
The list of 'tested' virt environments is pretty baroque and I wonder
how it's tested.
On 2024-04-09, Arne Vajhøj <arne@vajhoej.dk> wrote:
On 4/9/2024 8:45 AM, Simon Clubley wrote:
On 2024-04-08, Arne Vajhøj <arne@vajhoej.dk> wrote:
But I think it would be very problematic with VMS complaining
over configs that are not known to work.
Because removing that test would require a release.
We would see:
...
VMS 9.2-2H41 - added support for VM Foo 17 and VM Bar 3
VMS 9.2-2H42 - added support for VM Bar 4 and VM FooBar 7
..
No thanks.
It does not have to be a release - it could be a patch.
True.
But I don't like:
...
VMS 9.2-2 with HW patch 41
VMS 9.2-2 with HW patch 42
...
either.
How many VM solutions do you think there are out there ? :-)
Hint: there isn't 41 of them. :-)
VMS is used in mission-critical production environments. You should
not be allowed to accidentally boot into an unsupported configuration
without being made _VERY_ aware of that fact.
Hopefully those running a mission critical production environment
on VMS read about supported configs before moving production to
that config and never runs it in anything accidentally
booted.
According to some people: "There is no need for anything more safer than
the C or C++ programming language. You just have to be careful when writing your code...". Your comment above is from the same incorrect mindset.
In the real world, people make mistakes, especially in an outsourced environment where people cost, not people capability, is the driving
factor and hence people are not as skilled with VMS as they could be.
What is the plan to prevent people with privs from by mistake to do:
$ DEL SYS$COMMON:[000000...]*.*;*
On 2024-04-09, Dave Froble <davef@tsoft-inc.com> wrote:
But hasn't the discussion been about the CL stuff? I don't think CL and mission
critical co-exist. I'm sure VSI doesn't think that.
No. This is about adding checks to VMS itself.
As for due diligence, when did that go away? Any reasonable customer would >> check, and re-check, that they are using supported stuff.
People make mistakes. See my reply to Arne.
On 4/10/2024 8:10 AM, Simon Clubley wrote:
How many VM solutions do you think there are out there ? :-)
Hint: there isn't 41 of them. :-)
Considering versions - yes there are.
And adding host and versions of that we will probably pass 410.
On 4/8/2024 8:34 AM, Simon Clubley wrote:
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue
booting with the reason why been made clear. The bug in this case is that
this check is missing from VMS.
That's one way to look at it.
Another way is that we have been quite clear what the requirements are
to run VMS.
Any variation from that is unsupported. We recognize that there are
likely configurations
that are technically unsupported, but will still likely work.
Preventing those
configurations from working is someething we could do, but chose not to.
The other possibility is that VMS is _supposed_ to work OK in this
configuration, but this specific VM setup has been untested by VSI until
now. That means there is a bug in the VMS code itself which needs fixing.
We are not claiming support for Proxmox, although that testing has begun. Given that it is a KVM-based hypervisor, getting it fully supported
should not
be difficult, but we're not there yet.
It is for these reasons that we've been quite conservative about what is supported.
We are interested in any feedback we get, but that doesn't mean we're
going to respond to every
problem immediately when it's an unsupported configuration.
PS: Camiel has been appointed "Chief Architect and Strategist" at VSI recently. In his new role, he will relief Clair Grant from some of his responsibilities.
Robert A. Brooks schrieb am 08.04.2024 um 18:49:
On 4/8/2024 8:34 AM, Simon Clubley wrote:
If there's something VMS needs or a configuration it doesn't support,
then that should be probed at boot time and VMS should refuse to continue >>> booting with the reason why been made clear. The bug in this case is that >>> this check is missing from VMS.
That's one way to look at it.
Another way is that we have been quite clear what the requirements are
to run VMS.
Any variation from that is unsupported. We recognize that there are
likely configurations
that are technically unsupported, but will still likely work.
Preventing those
configurations from working is someething we could do, but chose not to.
The other possibility is that VMS is _supposed_ to work OK in thisWe are not claiming support for Proxmox, although that testing has begun.
configuration, but this specific VM setup has been untested by VSI until >>> now. That means there is a bug in the VMS code itself which needs fixing. >>
Given that it is a KVM-based hypervisor, getting it fully supported
should not
be difficult, but we're not there yet.
It is for these reasons that we've been quite conservative about what is
supported.
We are interested in any feedback we get, but that doesn't mean we're
going to respond to every
problem immediately when it's an unsupported configuration.
At the Connect IT Symposium, Thilo Lauer (VSI) yesterday gave an
introduction to Proxmox and demoed an OpenVMS instance running under
Proxmox. Camiel Vanderhoeven also mentioned that official support for
Proxmox as a hypervisor is relatively high on their list, especially due
to the changes at VMware since the takeover by Broadcom.
So - stay tuned...
Hans.
PS: Camiel has been appointed "Chief Architect and Strategist" at VSI >recently. In his new role, he will relief Clair Grant from some of his >responsibilities.
Given that the VMS mindset is supposed to be one of robustness and reliability, perhaps the proper approach is to enforce a default refuse
to boot on unsupposed configuration, but allow an override with a boot
flag or SYSGEN parameter.
That way, people don't accidentally use an unsupported configuration in production use, but you also don't stop people from using an
unsupported configuration if they choose to do so.
However, if you implement this, an impossible to miss message should be output on every boot so that the flag is not set and then forgotten
about.
It's quite correct that unsupported-hardware configurations are
incredibly difficult or ~impossible to detect ...
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 435 |
Nodes: | 16 (2 / 14) |
Uptime: | 148:36:42 |
Calls: | 9,120 |
Calls today: | 3 |
Files: | 13,425 |
Messages: | 6,033,767 |