• [gentoo-user] Re: e2fsck -c when bad blocks are in existing file?

    From Grant Edwards@21:1/5 to Michael on Tue Nov 8 15:30:01 2022
    On 2022-11-08, Michael <confabulate@kintzios.com> wrote:
    On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):

    -c This option causes e2fsck to use badblocks(8) program to do >> a read-only scan of the device in order to find any bad blocks. If any
    bad blocks are found, they are added to the bad block inode to prevent
    them from being allocated to a file or directory. If this option is
    specified twice, then the bad block scan will be done using a
    non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    Previously allocated to a file and now re-allocated or not, my understanding is with spinning disks the data in a bad block stays there unless you've dd'ed
    some zeros over it. Even then read or write operations could fail if the block is too far gone.[1] Some data recovery applications will try to read data off a bad block in different patterns to retrieve what's there. Once the
    bad block is categorized as such it won't be used by the filesystem to write new data to it again.

    Thanks. I guess I should have been more specific in my question.

    What does e2fsck -c do to the filesystem structure when it discovers a
    bad block that is already allocated to an existing inode?

    Is the inode's chain of block groups left as is -- still containing
    the bad block that (according to the man page) "has been added to the
    bad block inode"? Presumably not, since a block can't be allocated to
    two different inodes.

    Is the "broken" file split into two chunks (before/after the bad
    block) and moved to the lost-and-found?

    Is the man page's description only correct when the bad block is
    currently unallocated?

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Tue Nov 8 18:49:27 2022
    On Tuesday, 8 November 2022 17:55:51 GMT Laurence Perkins wrote:
    -----Original Message-----
    From: Grant Edwards <grant.b.edwards@gmail.com>
    Sent: Tuesday, November 8, 2022 6:28 AM
    To: gentoo-user@lists.gentoo.org
    Subject: [gentoo-user] Re: e2fsck -c when bad blocks are in existing file?

    On 2022-11-08, Michael <confabulate@kintzios.com> wrote:
    On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):
    -c This option causes e2fsck to use badblocks(8) program to >>> do

    a read-only scan of the device in order to find any bad blocks. If

    any bad blocks are found, they are added to the bad block inode to
    prevent them from being allocated to a file or directory. If this
    option is specified twice, then the bad block scan will be done
    using a non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    Previously allocated to a file and now re-allocated or not, my
    understanding is with spinning disks the data in a bad block stays
    there unless you've dd'ed some zeros over it. Even then read or write
    operations could fail if the block is too far gone.[1] Some data
    recovery applications will try to read data off a bad block in
    different patterns to retrieve what's there. Once the bad block is
    categorized as such it won't be used by the filesystem to write new data >> to it again.>
    Thanks. I guess I should have been more specific in my question.

    What does e2fsck -c do to the filesystem structure when it discovers a bad >block that is already allocated to an existing inode?

    Is the inode's chain of block groups left as is -- still containing the bad >block that (according to the man page) "has been added to the bad block >inode"? Presumably not, since a block can't be allocated to two different >inodes.

    Is the "broken" file split into two chunks (before/after the bad
    block) and moved to the lost-and-found?

    Is the man page's description only correct when the bad block is currently >unallocated?

    --
    Grant

    If I recall correctly, it will add any unreadable blocks to its internal
    list of bad sectors, which it will then refuse to allocate in the future.

    I don't believe it will attempt to move the file to elsewhere until it is written since: A) what would you then put in that block? You don't know
    the contents. B) Moving the file around would make attempts to recover the data from that bad sector significantly more difficult.

    As far as I know trying to write raw data directly to a bad block e.g. with dd or hdparm will trigger the disk's controller firmware to reallocate the data from the bad block to a spare. I always thought e2fsck won't write data in a block unless it is empty. badblocks -w will write test patterns to blocks and also trigger data reallocation on any bad blocks. badblocks -n, which corresponds to e2fsck -cc will only write to empty blocks and it may or may
    not trigger a firmware reallocation.

    I'm not sure what happens at a filesystem level, when one bad block within an extent is reallocated. The extent and the previously contiguous blocks will
    no longer be contiguous. Does the hardware expose some SMART data to inform the OS/fs of the reallocated block, to perform a whole extent remapping?


    This is, however, very unlikely to come up on a modern disk since most of them automatically remap failed sectors at the hardware level (also on
    write, for the same reasons). So the only time it would matter is if you have a disk that's more than about 20 years old, or one that's used up all its spare sectors...

    Unless, of course, you're resurrecting the old trick of marking a section of the disk as "bad" so the FS won't touch it, and then using it for raw data
    of some kind...

    You can, of course, test it yourself to be certain with a loopback file and
    a fake "badblocks" that just outputs your chosen list of bad sectors and
    then see if any of the data moves. I'd say like a 2MB filesystem and write
    a file full of 00DEADBEEF, then make a copy, blacklist some sectors, and
    hit it with your favorite binary diff command and see what moved. This is probably recommended since there could be differences between the behaviour of different versions of e2fsck.

    LMP


    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmNqpLcACgkQseqq9sKV ZxlB/xAA3MkR4wRML5waVZIw7vXKjvVKyAGgqwhfmMSLdkuuPgYRKIoG5gqcFTaq hOu9e0AKzCNDYyZvwv5iqPs/NzhVpg1qiKIMCNhXdevGILLmRnk6pKd9IqwRquGI vmpIxIV5/Rid5psSYVxET54kzg3GhNoe1A43/LTwnEvSwEP5ZErb9JQSEHr19YTK 6MuB5FeWL9UXRUMh5okgGH6CHbY64zQl+Zx4PR1lc1rBf4/7J/X91AA2aIPpeiny r8thkUEXzblBnqu6j/9W5wmNOJOXcuH3mYGdsB50NtPXzbEKDCRCCkQSt85p3yYZ dNH/j8dvctP4c4ftUR4lD8Mhd5sciugGsJCpbBWjbTZFGXGKfjIqAocVJXn9SUkA ajRGqqNuYGThQczXp7NdzNt+bVILLgGXc2QBAst21WWmGD6Lh4Dq4JEuAW2SEKAF WD2KK87FSKmqMhIV8FMqu1qjgKzTNOqzGxca1rT6n8fV1gEkIbLtNkho6BEYeCbg WIEj/gqRrje/u5D2azPnRKhSPk05Y/4j/YaWw4P7/qJ/NxnKCl3EIspIzeM/iRjf STlOQ7ybNIQmxt8g1jGnD0ryPR+kwYEyOrXzvsoV5sZE2+QaaXtb3GqrvchPccTW 5NZEhmCaidWAy5YbZGVZ+8gFY96e4WeTzLd/FrhpZA+uwuESfhY=
    =8Dhw
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Covici@21:1/5 to Laurence Perkins on Tue Nov 8 23:00:01 2022
    On Tue, 08 Nov 2022 12:55:51 -0500,
    Laurence Perkins wrote:



    -----Original Message-----
    From: Grant Edwards <grant.b.edwards@gmail.com>
    Sent: Tuesday, November 8, 2022 6:28 AM
    To: gentoo-user@lists.gentoo.org
    Subject: [gentoo-user] Re: e2fsck -c when bad blocks are in existing file?

    On 2022-11-08, Michael <confabulate@kintzios.com> wrote:
    On Tuesday, 8 November 2022 03:31:07 GMT Grant Edwards wrote:
    I've got an SSD that's failing, and I'd like to know what files
    contain bad blocks so that I don't attempt to copy them to the
    replacement disk.

    According to e2fsck(8):

    -c This option causes e2fsck to use badblocks(8) program to do
    a read-only scan of the device in order to find any bad blocks. If
    any bad blocks are found, they are added to the bad block inode to
    prevent them from being allocated to a file or directory. If this
    option is specified twice, then the bad block scan will be done
    using a non-destructive read-write test.

    What happens when the bad block is _already_allocated_ to a file?

    Previously allocated to a file and now re-allocated or not, my
    understanding is with spinning disks the data in a bad block stays
    there unless you've dd'ed some zeros over it. Even then read or write
    operations could fail if the block is too far gone.[1] Some data
    recovery applications will try to read data off a bad block in
    different patterns to retrieve what's there. Once the bad block is
    categorized as such it won't be used by the filesystem to write new data to it again.

    Thanks. I guess I should have been more specific in my question.

    What does e2fsck -c do to the filesystem structure when it discovers a bad block that is already allocated to an existing inode?

    Is the inode's chain of block groups left as is -- still containing the bad block that (according to the man page) "has been added to the bad block inode"? Presumably not, since a block can't be allocated to two different inodes.

    Is the "broken" file split into two chunks (before/after the bad
    block) and moved to the lost-and-found?

    Is the man page's description only correct when the bad block is currently unallocated?

    --
    Grant

    If I recall correctly, it will add any unreadable blocks to its internal list of bad sectors, which it will then refuse to allocate in the future.

    I don't believe it will attempt to move the file to elsewhere until it is written since:
    A) what would you then put in that block? You don't know the contents.
    B) Moving the file around would make attempts to recover the data from that bad sector significantly more difficult.

    This is, however, very unlikely to come up on a modern disk since most of them automatically remap failed sectors at the hardware level (also on write, for the same reasons). So the only time it would matter is if you have a disk that's more than
    about 20 years old, or one that's used up all its spare sectors...

    Unless, of course, you're resurrecting the old trick of marking a section of the disk as "bad" so the FS won't touch it, and then using it for raw data of some kind...

    You can, of course, test it yourself to be certain with a loopback file and a fake "badblocks" that just outputs your chosen list of bad sectors and then see if any of the data moves. I'd say like a 2MB filesystem and write a file full of 00DEADBEEF,
    then make a copy, blacklist some sectors, and hit it with your favorite binary diff command and see what moved. This is probably recommended since there could be differences between the behaviour of different versions of e2fsck.

    Maybe its time for spinwrite -- new version coming out soon, but it
    might save your bacon.

    --
    Your life is like a penny. You're going to lose it. The question is:
    How do
    you spend it?

    John Covici wb2una
    covici@ccs.covici.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Laurence Perkins on Thu Nov 10 00:40:02 2022
    On 2022-11-08, Laurence Perkins <lperkins@openeye.net> wrote:

    What happens when the bad block is _already_allocated_ to a file?

    [...]

    Thanks. I guess I should have been more specific in my question.

    What does e2fsck -c do to the filesystem structure when it discovers
    a bad block that is already allocated to an existing inode?

    Is the inode's chain of block groups left as is -- still containing
    the bad block that (according to the man page) "has been added to
    the bad block inode"? Presumably not, since a block can't be
    allocated to two different inodes.

    Is the "broken" file split into two chunks (before/after the bad
    block) and moved to the lost-and-found?

    Is the man page's description only correct when the bad block is
    currently unallocated?

    If I recall correctly, it will add any unreadable blocks to its
    internal list of bad sectors, which it will then refuse to allocate
    in the future.

    I'm asking what happens to the file containing the bad block. Perphaps
    nothing. The man page says the block is added to the "bad block
    inode". If that block was already allocated, is the bad block is now
    allocated to two different inodes?

    I don't believe it will attempt to move the file to elsewhere until
    it is written since:

    A) what would you then put in that block? You don't know the contents.

    You wouldn't put anything in that block.

    One solution that comes to mind would be to truncate the file
    immediately before the bad block (we'll call that truncated file the
    "head"). Then you allocate a new inode to which you assign all of the
    blocks after the bad block (we'll call that the "tail"). The bad block
    is then moved to the "bad blocks inode" and the head/tail files are
    moved into the lost+found.

    B) Moving the file around would make attempts to recover the data
    from that bad sector significantly more difficult.

    Yes, probably. Any manipulation of a filesystem (like adding the block
    to the bad block inode) on a failing disk seems like a bad idea.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wol@21:1/5 to Grant Edwards on Thu Nov 10 01:00:02 2022
    On 09/11/2022 23:31, Grant Edwards wrote:
    If I recall correctly, it will add any unreadable blocks to its
    internal list of bad sectors, which it will then refuse to allocate
    in the future.

    I doubt you recall correctly. You should ONLY EVER conclude a block is
    bad if you can't write to it. Remember what I said - if I read my 8TB
    drive from end-to-end twice, then I should *expect* a read error ...

    I'm asking what happens to the file containing the bad block. Perphaps nothing. The man page says the block is added to the "bad block
    inode". If that block was already allocated, is the bad block is now allocated to two different inodes?

    If a read fails, you SHOULD NOT do anything. If a write fails, you move
    the block and mark the failed block as bad. But seeing as you've moved
    the block, the bad block is no longer allocated to any file ...

    Cheers,
    Wol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Wol on Thu Nov 10 01:20:01 2022
    On 2022-11-09, Wol <antlists@youngman.org.uk> wrote:
    On 09/11/2022 23:31, Grant Edwards wrote:
    If I recall correctly, it will add any unreadable blocks to its
    internal list of bad sectors, which it will then refuse to allocate
    in the future.

    I doubt you recall correctly.

    The e2fsck man page states explicitly that a -c read failure will
    cause the block to be added to the bad block inode. You're claiming
    that is not what happens?

    You should ONLY EVER conclude a block is bad if you can't write to
    it. Remember what I said - if I read my 8TB drive from end-to-end
    twice, then I should *expect* a read error ...

    OK...

    I'm asking what happens to the file containing the bad block. Perphaps
    nothing. The man page says the block is added to the "bad block
    inode". If that block was already allocated, is the bad block is now
    allocated to two different inodes?

    If a read fails, you SHOULD NOT do anything.

    Thanks, but I'm not asking what I should do. I'm not asking what the
    filesystem should do. I'm not asking what disk-drive controller
    firmware should do or does do with failed/spare blocks.

    I'm asking what e2fsck -c does when the bad block is already allocated
    to an inode. Specifically:

    Is the bad block removed from the inode to which it was allocated?

    Is the bad block left allocated to the previous inode as well as
    being added to the bad block inode?

    We've gotten lots of answers to lots of other questions, but after
    re-reading the thread a few times, I still haven't seen an answer to
    the question I asked.

    If a write fails, you move the block and mark the failed block as
    bad. But seeing as you've moved the block, the bad block is no
    longer allocated to any file ...

    Are you stating e2fsck -c will removed bad block from the inode to
    which it was allocated before the scan? Is it replaced with a
    different block? Or just left as an empty "hole" that can't be read
    from or written to?

    The e2fsck man page does not state that the bad block is removed from
    the old inode, only that that bad block is added to the bad block inode.

    If a block is allocated to an inode, I would call that "allocated to a
    file". It's not a file that has a visible name that shows up in a
    directory, but it's still a file.

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Michael on Sat Nov 12 17:50:01 2022
    On 2022-11-12, Michael <confabulate@kintzios.com> wrote:
    On Wednesday, 9 November 2022 16:53:13 GMT Laurence Perkins wrote:

    Badblocks doesn't ask to write anything at the end of the run. You
    tell it whether you want a read test, a write-read test or a
    read-write-read-replace test at the beginning.

    Not to labour the point, but 'e2fsck -v -c' runs a read test and at
    the end it informs me "... Updating bad block inode", even if it
    came across no read errors (0/0/0) and consequently does not prompt
    for a fs repair.

    That's _e2fsck_ thats doing the writing at the end, not badblocks. The statement was that _badblocks_ doesn't ask to write anything at the
    end of the run.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael@21:1/5 to All on Sat Nov 12 19:34:05 2022
    On Saturday, 12 November 2022 16:44:05 GMT Grant Edwards wrote:
    On 2022-11-12, Michael <confabulate@kintzios.com> wrote:
    On Wednesday, 9 November 2022 16:53:13 GMT Laurence Perkins wrote:
    Badblocks doesn't ask to write anything at the end of the run. You
    tell it whether you want a read test, a write-read test or a
    read-write-read-replace test at the beginning.

    Not to labour the point, but 'e2fsck -v -c' runs a read test and at
    the end it informs me "... Updating bad block inode", even if it
    came across no read errors (0/0/0) and consequently does not prompt
    for a fs repair.

    That's _e2fsck_ thats doing the writing at the end, not badblocks. The statement was that _badblocks_ doesn't ask to write anything at the
    end of the run.

    Thanks for correcting me, the badblocks man page also makes this clear.
    Unless an output file is specified, it will only display the list of bad
    blocks on its standard output. It's been a while since I had to run badblocks and forgot its behaviour.

    Have your questions been answered satisfactorily by Lawrence's contribution? -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEXqhvaVh2ERicA8Ceseqq9sKVZxkFAmNv9S0ACgkQseqq9sKV ZxmAehAAhOFji4POpMdFFCzSQb2tGNo4EZqt4/R8dpkqZ6qzNL/uc3yun6e0FeTN IbaiV2vD4uG8AgDdQiXo0phYn3iqtE2MZXvv7BsccD97Vq1DEUyanlJvHKdCr9iF VeBGdXK1eNQ4gD50+LK8ceGrop6LriGqUnD+B/e66y9CfyZXha2CKCIsWUihDDNh VZ5WAA2b0jJusjQzijhEIf48kL9kkzaRuJRtez8HG0jMm1UFbUfmQJjeAKrWMAvv D0KvpDwmKiKoqzHZyGEncqF+NAnSpG7DwoGm/EdyULvUE2XiOZHBG/8UEp50zc2M bdJ9+SBsigxGH60Z+7bqfmZrC0cR8Q0Nilq51E3u49q/Ev2PlPqddONMLyFZSDR4 328xqCSB9jnK08qktDdUheqUoeVU1yTu40HIJixoWH3N+VCuPY61+/VVOH7R9FUT wkZo9u0JPdEuMrA6kzz9XHeuTwGedBcuDZM8Fm5nKvJUn+rD7xVMV2RPxoNeDZKy aYKTSs3+s21JhbCksum/O1pXdIpVEUfkShXSK9nmpfKQ0asSYPyycB5D8Jjo+Urw TkhPu7Y17XZGb2FDpjMXlQcCRmL+8pIvZXAuXWSp71+voX5Xgo0UGYd+O+d6kg+6 Q/aM1eMa5NhuSK4QMrrxbnXup/XL6Aqxo2V7Ly70Cmriqo5d6iQ=
    =ZDM0
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Edwards@21:1/5 to Michael on Sun Nov 13 05:00:01 2022
    On 2022-11-12, Michael <confabulate@kintzios.com> wrote:

    Have your questions been answered satisfactorily by Lawrence's contribution?

    Yes, Lawrence's experiment answered the my question: e2fsck adds the
    bad block to the "bad block" inode and leaves it also allocated to the
    existing file.

    Presumably if you don't allow it to clone the block, reading that file
    will return an error when it gets to the bad block. Once you delete
    that file, the bad block will never get reallocated by the filesystem
    since it still belongs to the bad block inode.

    The failing SSD that prompted the question has now been replaced and a
    fresh Gentoo system installed on the new drive. I never did figure out
    which files contained the bad blocks (there were 37 bad blocks,
    IIRC). They apparently didn't belong to any of the files I copied over
    to the replacement drive.

    The old drive was a Samsung 850 EVO SATA drive, and the new one is a
    Samsung 980 PRO M.2 drive. The new one is noticably faster than the
    old one (which in turn was way faster than the spinning platter drive
    it had replaced).

    --
    Grant

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)