• Replaced FC drive in zpool, now can't get rid of old drive

    From Scott@21:1/5 to All on Thu Jul 28 06:46:20 2016
    On a Solaris 10x64 server attached to an HP SmartArray (with 32 LUNs) I have two zpools, each using 16 LUNs. The LUNs are presented via FC (they map 1:1 to HDDs in the HP box). This was my attempt to let ZFS deal with raidz2 of two pools, rather than
    using the hardware RAID5 the HP box likes.

    Multipathing is also done to the LUNs, so the device I deal with starts with /scsi_vchi/... (though /dev/rdsk and /dev/dsk exists at a lower layer for it).

    I've had this running for a few years, then one of the HDDs died.
    Googling (or, https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbvf.html) says to take the zpool offline in order to do the HDD replacement, which seems absurd, so I didn't do that. I deleted the old LUN (from the HP) and created a new one (from the
    HP), and when I did the new LUN has a new WWPN.

    Next I had Solaris scan for and create new links to the LUN.
    Then I did a zpool replace <pool> <old LUN> <new LUN>.
    The pool is healthy, after about 24 hours of resilvering.

    How do I get rid of the old LUN though?
    format and luxadm probe shows it there, but it's not in mpathadm list LU.
    If I luxadm probe , get the WWN of the B/O FC HDD, then
    luxadm display <WWN> 2>&1 | less
    I see ERROR: I/O failure communicating with /dev/rdsk/c5t<longnum>/d0s2

    cfgadm -al doesn't show controllers higher than c2.

    Regards, Scott

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DoN. Nichols@21:1/5 to Scott on Fri Jul 29 04:25:40 2016
    On 2016-07-28, Scott <spackard@gmail.com> wrote:
    On a Solaris 10x64 server attached to an HP SmartArray (with 32 LUNs) I have two zpools, each using 16 LUNs. The LUNs are presented via FC (they map 1:1 to HDDs in the HP box). This was my attempt to let ZFS deal with raidz2 of two pools, rather than
    using the hardware RAID5 the HP box likes.

    Multipathing is also done to the LUNs, so the device I deal with
    starts with /scsi_vchi/... (though /dev/rdsk and /dev/dsk exists at a
    lower layer for it).

    O.K. No experience with the multipathing, so that might make a difference.

    I've had this running for a few years, then one of the HDDs died.

    Started with new drives, or used ones? I've done both, FWIW.
    But I'm using Eurologic drive trays with fiber optics connecting to the
    server.

    Googling (or,
    https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbvf.html) says to
    take the zpool offline in order to do the HDD replacement, which seems absurd, so I didn't do that. I deleted the old LUN (from the HP) and
    created a new one (from the HP), and when I did the new LUN has a new
    WWPN.

    I've replaced drives without having to take the zpool offline.

    Of course it has a new WWN -- unique ones are built into each
    drive.

    Next I had Solaris scan for and create new links to the LUN.
    Then I did a zpool replace <pool> <old LUN> <new LUN>.
    The pool is healthy, after about 24 hours of resilvering.

    O.K. Quite a long resilvering -- but maybe you are using larger drives. I'm using 500 GB FC drives, in some Eurologic trays -- JBODs, connected to the server via fiber optics. And I alson ever put that
    many drives in an array (in part because of the number of drives which
    the Eurologic tray holds.) I typically put five drives in a raidz2,
    have one hot spare per array (cross-linked so they can serve in any
    array that needs one of that size), and one drive which I can
    experiment with.

    I've even moved from 146 GB drives to 500 GB drives, by
    replacing them one at a time -- and when the last drive was replaced,
    the size of the pool jumped to appropriate for the size of the new
    drives.

    How do I get rid of the old LUN though?
    format and luxadm probe shows it there, but it's not in mpathadm list LU.
    If I luxadm probe , get the WWN of the B/O FC HDD, then
    luxadm display <WWN> 2>&1 | less
    I see ERROR: I/O failure communicating with /dev/rdsk/c5t<longnum>/d0s2

    Did you use "devfsadm" to get the system to see the new drive
    and to say goodbye to the old one? Typically, I use:

    devfsadm -c disk -C

    -c so it doesn't bother doing anything with the other things
    (tape drives and whatever),

    and the -C to clean out dev entries for things which are no
    longer there.

    But -- your hardware RAID controller may be telling the system
    that it still is present somewhere -- it just can't talk to it now. :-)

    The only thing that I have with a hardware RAID controller is a
    Sun Fire X4150 which is powered down at the moment because of a
    thunderstorm -- and because it is an experimental machine for me, not a
    server which I am depending on so far.

    Note, that some things which were automatic in older zfs
    versions are no longer so in the current ones. Try "zpool get all" to
    get a list like this:

    ======================================================================
    usage:
    get <"all" | property[,...]> <pool> ...

    the following properties are supported:

    PROPERTY EDIT VALUES

    allocated NO <size>
    capacity NO <size>
    free NO <size>
    guid NO <guid>
    health NO <state>
    size NO <size>
    altroot YES <path>
    autoexpand YES on | off
    autoreplace YES on | off
    bootfs YES <filesystem>
    cachefile YES <file> | none
    delegation YES on | off
    failmode YES wait | continue | panic
    listsnapshots YES on | off
    readonly YES on | off
    version YES <version>
    ======================================================================

    autoreplace is one of those.

    ======================================================================
    autoreplace=on | off

    Controls automatic device replacement. If set to "off",
    device replacement must be initiated by the administra-
    tor by using the "zpool replace" command. If set to
    "on", any new device, found in the same physical
    location as a device that previously belonged to the
    pool, is automatically formatted and replaced. The
    default behavior is "off". This property can also be
    referred to by its shortened column name, "replace".
    ======================================================================

    And autoexpand is another. That is what allows the pool size to
    expand when all drives in the pool have been replaced with larger ones.

    ======================================================================
    autoexpand=on | off

    Controls automatic pool expansion when a larger device
    is added or attached to the pool or when a larger device
    replaces a smaller device in the pool. If set to on, the
    pool will be resized according to the size of the
    expanded device. If the device is part of a mirror or
    raidz then all devices within that mirror/raidz group
    must be expanded before the new space is made available
    to the pool. The default behavior is off. This property
    can also be referred to by its shortened column name,
    expand.
    ======================================================================

    cfgadm -al doesn't show controllers higher than c2.

    Hmm ... what server? I am running a T5220 as my primary server,
    and that one goes up to c5 -- with some fiber optic cards in it to talk
    to two of the Eurologic trays, and spare fiber optic ports (which are c2
    and c3, I think. The trays were previously running connected to a Sun
    Fire 280R (rack-mount server version of the Sun Blade 1000/2000), and
    this was via copper connection with both trays chained together.

    Regards, Scott

    Hopefully, if what I have mentioned does not apply, something
    else may do so.

    Good Luck,
    DoN.

    --
    Remove oil spill source from e-mail
    Email: <BPdnicholsBP@d-and-d.com> | (KV4PH) Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)