show list
On another vendor?s x64 server if it can?t see RAM then it won?t
start. It?s been awhile since I?ve used Sun x64-based hardware. In the
old days a blade chassis needed power for about 1 hour before I could
power on a blade; I?m wondering how many minutes power must be applied
to an x4100 before attempting power on.
I?d use a DMM to measure voltage into the power supplies, to make sure
it was the minimum spec for the power supply, then try reseating RAM.
I?d put the cover back on; docs mention a chassis intrusion switch.
You?ve Googled for sun fire x4100m2 service manual, I assume.
Verifying cause of NO chassis power:
Visually inspect each power supply for the status of the AC Present,
Power OK, and Fault LEDs. If the Fault LED is illuminated on any of the
PSUs then further troubleshooting will be required.
If AC Present is NOT illuminated, ensure the AC power cords are
securely plugged into the server and connected to working AC power
outlet(s).
If Power OK is NOT illuminated, but AC Present IS, then further troubleshooting will be required. Refer to the system Servers Service
Manual and Servers Diagnostics Guide for additional troubleshooting
steps.
On 2018-01-15, Scott Packard <> wrote:
On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
On 2018-01-15, Scott Packard <> wrote:
Hadn't set the group to send mails, sorry for the delay in replying.
The servers were in the garage during the cold front so a weak battery
makes sense.
I swapped out the one I could with a brand new, but no
luck. There was another thing that looked like a multi-battery pack in shrinkwrap but it didn't want to come off the board. Need to study it
more when warmer.
What I was looking at is Item 12 on page 21:
https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf
On 2018-01-24, leam hall <> wrote:
On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
On 2018-01-15, Scott Packard <> wrote:
Hadn't set the group to send mails, sorry for the delay in replying.
The servers were in the garage during the cold front so a weak battery makes sense.
Also, the environment values on PDF page 184 of the document
below may be your problem if you were trying to run it in the cold.
====================================================================== Temperature 41 - 95 Deg F
(operating) 5 - 35 Deg C
Temperature -40 - 158 Deg F
(storage) -40 - 70 Deg C
======================================================================
I swapped out the one I could with a brand new, but no
luck. There was another thing that looked like a multi-battery pack in shrinkwrap but it didn't want to come off the board. Need to study it
more when warmer.
What I was looking at is Item 12 on page 21:
https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf
PDF page 106 has a photo of the cell in its holder and
instructions on replacing it.
You might want to look up the section on resetting the CMOS
memory (pages 81 and 87) as it may have been corrupted by the low
voltage in the previous cell.
Good Luck,
DoN.
On Wednesday, January 24, 2018 at 9:51:39 PM UTC-5, DoN. Nichols wrote:
On 2018-01-24, leam hall <> wrote:
On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
On 2018-01-15, Scott Packard <> wrote:
Hadn't set the group to send mails, sorry for the delay in replying.
The servers were in the garage during the cold front so a weak battery
makes sense.
Also, the environment values on PDF page 184 of the document
below may be your problem if you were trying to run it in the cold.
======================================================================
Temperature 41 - 95 Deg F
(operating) 5 - 35 Deg C
Temperature -40 - 158 Deg F
(storage) -40 - 70 Deg C
======================================================================
I swapped out the one I could with a brand new, but no
luck. There was another thing that looked like a multi-battery pack in
shrinkwrap but it didn't want to come off the board. Need to study it
more when warmer.
What I was looking at is Item 12 on page 21:
https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf
PDF page 106 has a photo of the cell in its holder and
instructions on replacing it.
You might want to look up the section on resetting the CMOS
memory (pages 81 and 87) as it may have been corrupted by the low
voltage in the previous cell.
Good Luck,
DoN.
Thanks! I re-replaced the battery and made sure it was turned the
right way this time. Don't have a jumper so I used a small screwdriver
blade to short between the two poles of the jumper. Plugged her back in
and still no go.
My bet is that the temperature was the issue. I *assume* the
screwdriver make a good enough contact to serve as a jumper. Hmm...I
wonder if I have any old hard drives laying around with jumpers on
them...
<some time later>
Found a jumper. Went through four batteries, including the one from a once-working server that now doesn't want to work. :(
Couple batteries were very close to the tolerance, 2.61 measured with
2.62 minimal. The server said it wasn't a critical error but still
didn't come up.
On 2018-01-30, leam hall <l> wrote:
On Wednesday, January 24, 2018 at 9:51:39 PM UTC-5, DoN. Nichols wrote:
On 2018-01-24, leam hall <> wrote:
On Monday, January 15, 2018 at 6:45:57 PM UTC-5, DoN. Nichols wrote:
On 2018-01-15, Scott Packard <> wrote:
Hadn't set the group to send mails, sorry for the delay in replying.
The servers were in the garage during the cold front so a weak battery >> > makes sense.
Also, the environment values on PDF page 184 of the document
below may be your problem if you were trying to run it in the cold.
======================================================================
Temperature 41 - 95 Deg F
(operating) 5 - 35 Deg C
Temperature -40 - 158 Deg F
(storage) -40 - 70 Deg C
======================================================================
I swapped out the one I could with a brand new, but no
luck. There was another thing that looked like a multi-battery pack in >> > shrinkwrap but it didn't want to come off the board. Need to study it
more when warmer.
What I was looking at is Item 12 on page 21:
https://docs.oracle.com/cd/E19121-01/sf.x4200m2/819-1157-23/819-1157-23.pdf
PDF page 106 has a photo of the cell in its holder and
instructions on replacing it.
You might want to look up the section on resetting the CMOS
memory (pages 81 and 87) as it may have been corrupted by the low
voltage in the previous cell.
Good Luck,
DoN.
Thanks! I re-replaced the battery and made sure it was turned the
right way this time. Don't have a jumper so I used a small screwdriver blade to short between the two poles of the jumper. Plugged her back in
and still no go.
My bet is that the temperature was the issue. I *assume* the
screwdriver make a good enough contact to serve as a jumper. Hmm...I
wonder if I have any old hard drives laying around with jumpers on
them...
Most modern drives use smaller jumpers, though old enough ones
might supply what you need.
<some time later>
Found a jumper. Went through four batteries, including the one from a once-working server that now doesn't want to work. :(
Couple batteries were very close to the tolerance, 2.61 measured with
2.62 minimal. The server said it wasn't a critical error but still
didn't come up.
I seem to remember that you have to have the cover in place
during the power up to reset the data. You don't say whether it is
closed or not, but there is a sensor (Magnet & Reed switch) to tell
whether the cover is in place or not.
Good Luck,
DoN.
Most of my stuff is old. Drives too. ;)
Bought a new pack of batteries and have gone through a couple on the one server.
5515 Thu Jan 4 11:23:26 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success 5514 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef3 : 01/04/2018 : 11:20:32 : Voltage : mb.v_bat : Lower Non-recove
rable going low : reading 0.62 < threshold 2.34 Volts
5513 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef2 : 01/04/2018 : 11:20:27 : Voltage : mb.v_bat : Lower Critical g
oing low : reading 0.62 < threshold 2.53 Volts
5512 Thu Jan 4 11:20:33 2018 IPMI Log critical
ID = ef1 : 01/04/2018 : 11:20:22 : Entity Presence : io.id1.prsnt : Devi
ce Present
Did the "jumper on", close lid, power up thing multiple times. The other server came back around, this one didn't. Is the "mb.v_bat" the CMOS battery or the one next to it?
http://reuel.net/images/v4100_batteries.jpg
Thanks!
Leam
Don, I really appreciate all the help!
The server self-identifies:
product_name = SUN FIRE X4100 M2
product_part_number = 602-4492-01
I did the full 'jumper, power on, close lid' reset bit with a third battery. This time I let it sit for a few minutes. The low battery seems to be during the reset time given the gap in /SP/logs/event/list.
At present "start /SYS" fails. It shows a PSU_FAULT and there are amber lights. However, both PSUs have two green lights on the rear.
Next step is to try resetting PSUs and see what happens. Will keep you updated.
Thanks!
Leam
On Thursday, February 1, 2018 at 8:54:44 AM UTC-5, leam hall wrote:
Don, I really appreciate all the help!
The server self-identifies:
product_name = SUN FIRE X4100 M2
product_part_number = 602-4492-01
I did the full 'jumper, power on, close lid' reset bit with a third
battery. This time I let it sit for a few minutes. The low battery seems
to be during the reset time given the gap in /SP/logs/event/list.
At present "start /SYS" fails. It shows a PSU_FAULT and there are
amber lights. However, both PSUs have two green lights on the rear.
Next step is to try resetting PSUs and see what happens. Will keep
you updated.
Thanks!
Leam
Hrmph. Maybe I was wrong about the battery issue just being during the
reset. This is with the third brand new CR2032 battery with the positive
side facing the center of the motherboard. I pulled the plugs, reseated
one PSU and pulled the other. Here's the /SP/logs/event/list:
5539 Thu Jan 4 14:33:17 2018 Audit Log minor
root : Open Session : object = /session/type : value = shell : success 5538 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f0a : 01/04/2018 : 14:30:30 : Voltage : mb.v_bat : Lower Non-recove
rable going low : reading 0.62 < threshold 2.34 Volts
5537 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f09 : 01/04/2018 : 14:30:25 : Voltage : mb.v_bat : Lower Critical g
oing low : reading 0.62 < threshold 2.53 Volts
5536 Thu Jan 4 14:30:30 2018 IPMI Log critical
ID = f08 : 01/04/2018 : 14:30:19 : Entity Presence : io.id1.prsnt : Devi
ce Present
5535 Thu Jan 4 14:30:19 2018 IPMI Log critical
ID = f07 : 01/04/2018 : 14:30:19 : Voltage : mb.v_bat : Lower Non-critic
al going low : reading 0.62 < threshold 2.62 Volts
5534 Thu Jan 4 14:36:37 2018 IPMI Log critical
ID = f05 : 01/04/2018 : 14:36:37 : Power Supply : ps1.pwrok : State Deas
serted
Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.
On Saturday, February 3, 2018 at 12:11:27 PM UTC-5, leam hall wrote:
Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.
Didn't help. Server powers up fine but won't start the OS.
On 2018-02-06, leam hall <l> wrote:
On Saturday, February 3, 2018 at 12:11:27 PM UTC-5, leam hall wrote:
Hadn't thought about the cap discharge. Both servers are now flaky. Plugged in one and will let it sit for the rest of the day. Will let you know how it comes out.
Didn't help. Server powers up fine but won't start the OS.
Hmm ... pretty much all that I can do. Is it possible that you
have some bad RAM DIMMs in there? IIRC, there are LEDs to indicate
every bad DIMM (and you need a lot of DIMMs to make a valid bank, IIRC.
Also, there are LEDs to indicate bad CPUs.
I only have the two systems (X4100M2 and X4200) and since they
are in service I can't pull them down to check other things.
All the fans are good, I hope?
Good Luck,
DoN.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 242:59:05 |
Calls: | 6,625 |
Calls today: | 1 |
Files: | 12,175 |
Messages: | 5,320,203 |