• zpool UNAVAIL after errors on both submirrors.

    From ARZ Lab@21:1/5 to All on Sat Feb 8 00:46:07 2020
    Hello All

    In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrrors, 2 disks each. After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
    Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
    I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
    Disks appear in format -e, even though under different names. Labels look good. What I tried:
    fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
    Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
    Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
    Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
    Would DD to a fresh 2TB disk copy that FAILED mark as well?
    Anything else to try?
    Thanks
    Andrei

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to ARZ Lab on Sun Feb 9 14:16:24 2020
    On 2/8/20 1:46 AM, ARZ Lab wrote:
    Anything else to try?

    Have you tried exporting and re-importing the pool?

    Years ago I had a system that wasn't finding member disks for some
    reason, and exporting & importing caused the system to scan all disks to
    find the previously missing disks.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott@21:1/5 to ARZ Lab on Sun Feb 9 16:08:08 2020
    On Saturday, February 8, 2020 at 12:46:11 AM UTC-8, ARZ Lab wrote:
    Hello All

    In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrors, 2 disks each. After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
    Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
    I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
    Disks appear in format -e, even though under different names. Labels look good.
    What I tried:
    fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
    Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
    Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
    Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
    Would DD to a fresh 2TB disk copy that FAILED mark as well?
    Anything else to try?
    Thanks
    Andrei


    That's one thing I've noticed about ZFS: it's a little behind when it comes
    to replacing bad disks or controllers, compared to NetApp.
    It's telling you what exists in its universe; you need to understand what changes you made in that universe and tell it what those changes were in order to get it to your current hardware.


    # cfgadm -al
    # format -e
    (Should see the HDDs here. Don't continue otherwise.)

    I think https://docs.oracle.com/cd/E19253-01/819-5461/gbbvb/index.html "Resolving a Missing Device" or "Physically Reattaching a Device"

    Regards, Scott

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From ARZ Lab@21:1/5 to ARZ Lab on Mon Feb 10 02:04:50 2020
    On Saturday, February 8, 2020 at 9:46:11 AM UTC+1, ARZ Lab wrote:
    Hello All

    In Solaris 11.1 x86, I have a ZFS pool consisting of 4 mirrrors, 2 disks each.
    After some hardware manipulation, both disks of mirror-0 happened to be on one controller, and one day, that controller generated many IO errors.
    Both disk were marked faulty by fmadm, appear as UNAVAIL in zpool status, and whole zpool is UNAVAIL, too.
    I'm pretty sure the data is still available, even though zpool status says it's corrupted. Controller is replaced.
    Disks appear in format -e, even though under different names. Labels look good.
    What I tried:
    fmadm repaired for all faulty FMRIs. Marked repaired successfully, but then appear in fmadm faulty again.
    Booted from a backup BE, and from Live 11.3 USB, pool still shows those disks UNAVAIL, even though "fmadm failed" shows no entries.
    Re-shuffled the disks across controllers. Failed disks still appear as UNAVAIL under their OLD names, i.e. c1t0d0s0, even though that c1t0d0s0 is now a small SSD in rpool, not the 2 TB spindle from the failed pool.
    Where exactly is this FAILED/UNAVAIL info is kept? Can I clean it?
    Would DD to a fresh 2TB disk copy that FAILED mark as well?
    Anything else to try?
    Thanks
    Andrei



    Sure, importing/exporting was the very forst thing I'd try, but im my case neither was possible, since the pool was UNAVAIL.

    Finally, I solved it with zpool clear -F POOLNAME
    It rolled the pool state to some time before by dripping some pending transactions, which was fine for me.
    Scrub went with 0 errors after that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)