Forum: >>> Magnum BBS <<<

zpool UNAVAIL after errors on both submirrors.

From ARZ Lab@21:1/5 to All on Sat Feb 8 00:46:07 2020

Hello All

In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrrors, 2 disks each. After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
Disks appear in format -e, even though under different names. Labels look good. What I tried:
fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
Would DD to a fresh 2TB disk copy that FAILED mark as well?
Anything else to try?
Thanks
Andrei

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Grant Taylor@21:1/5 to ARZ Lab on Sun Feb 9 14:16:24 2020

On 2/8/20 1:46 AM, ARZ Lab wrote:

Anything else to try?

Have you tried exporting and re-importing the pool?

Years ago I had a system that wasn't finding member disks for some
reason, and exporting & importing caused the system to scan all disks to
find the previously missing disks.

--
Grant. . . .
unix || die

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott@21:1/5 to ARZ Lab on Sun Feb 9 16:08:08 2020

On Saturday, February 8, 2020 at 12:46:11 AM UTC-8, ARZ Lab wrote:

Hello All

In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrors, 2 disks each. After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
Disks appear in format -e, even though under different names. Labels look good.
What I tried:
fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
Would DD to a fresh 2TB disk copy that FAILED mark as well?
Anything else to try?
Thanks
Andrei

That's one thing I've noticed about ZFS: it's a little behind when it comes
to replacing bad disks or controllers, compared to NetApp.
It's telling you what exists in its universe; you need to understand what changes you made in that universe and tell it what those changes were in order to get it to your current hardware.

# cfgadm -al
# format -e
(Should see the HDDs here. Don't continue otherwise.)

I think https://docs.oracle.com/cd/E19253-01/819-5461/gbbvb/index.html "Resolving a Missing Device" or "Physically Reattaching a Device"

Regards, Scott

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From ARZ Lab@21:1/5 to ARZ Lab on Mon Feb 10 02:04:50 2020

On Saturday, February 8, 2020 at 9:46:11 AM UTC+1, ARZ Lab wrote:

Hello All

In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrrors, 2 disks each.
After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
Disks appear in format -e, even though under different names. Labels look good.
What I tried:
fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
Would DD to a fresh 2TB disk copy that FAILED mark as well?
Anything else to try?
Thanks
Andrei

Sure, importing/exporting was the very forst thing I'd try, but im my case neither was possible, since the pool was UNAVAIL.

Finally, I solved it with zpool clear -F POOLNAME
It rolled the pool state to some time before by dripping some pending transactions, which was fine for me.
Scrub went with 0 errors after that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Wed Jan 15 11:37:00 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Plume
  Wed Jan 15 11:05:58 2025
  from Uk via Telnet
- Guest
  Wed Jan 15 06:29:08 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Guest
  Wed Jan 15 02:17:27 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Keyop
  Tue Jan 14 23:13:56 2025
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Tue Jan 14 21:42:40 2025
  from Wales, Uk via Telnet
- Guest
  Tue Jan 14 09:13:19 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Bob Worm
  Tue Jan 14 07:58:29 2025
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	399
Nodes:	16 (1 / 15)
Uptime:	103:52:44
Calls:	8,365
Calls today:	4
Files:	13,165
Messages:	5,898,203

zpool UNAVAIL after errors on both submirrors.

Who's Online

Recent Visitors

System Info