ZFS counted about 180 write errors while resilvering ~80GB. Most
seemed to be retryable and succeeded on the second try (see logs
below).
It looks like there's some kind of compatibility or interface problem....
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error...
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error...
----------
Here's smartctl -x output, keeping only what looked "interesting"/relevant:
0x06 0x018 4 64 --- Number of Interface CRC Errors
I previously wrote (in part):
It looks like there's some kind of compatibility or interface problem.
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed...
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error...
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error...
----------
Here's smartctl -x output, keeping only what looked "interesting"/relevant:
0x06 0x018 4 64 --- Number of Interface CRC Errors
Careful reading of the error messages and check of smartctl info
indicated to me some kind of data/interface/compatibility error, not a read/write problem in the SSD.
Google search turned up various Samsung 860 and 870 EVO articles.
What I've determined:
* It's not a bad data cable:
Some of the articles suggested the problem might be a bad cable.
I ordered two new ones (SATA III).
Result: no improvement.
That's as I expected, since I was pretty sure my cables were good,
but it was cheap to try.
* Fix 1:
Based on the articles I read, this problem with Samsung 870 EVO and
860 EVO SSDs appears to affect not just FreeBSD, but *BSD and at
least some Linuxes.
The solution/workaround (at least for FreeBSD) is to disable command
queueing ("camcontrol negotiate $theSSD -T disable").
* Fix 2:
Connect the SSD with a USB-to-SATA adapter cable.
I found such a cable in stock at Best Buy for $12.
Perhaps this works because there's no command queueing over USB.
As a side note, instead of turning off tagged queueing, I also tried
reducing the number of tags from 32 to 2. Didn't help: the errors
continued to happen.
Maybe some day Samsung will come out with new firmware that fixes this problem.
Is this something I should post to bugzilla (it's not a FreeBSD bug,
though) or to some FreeBSD forum (which one)? Google no longer gets
USENET, so I don't expect this article would be found by Google search.
Is this something I should post to bugzilla (it's not a FreeBSD bug,
though) or to some FreeBSD forum (which one)? Google no longer gets
USENET, so I don't expect this article would be found by Google
search.
Report your findings in bugzilla. A developer may add a quirk to
the scsi subsystem to automatically detect the drive and "do the
right thing". I would use somthing like "SCSI tag-queue error with
Samsung 870 EVO SSD" as the title.
Report your findings in bugzilla. A developer may add a quirk to
the scsi subsystem to automatically detect the drive and "do the
right thing". I would use somthing like "SCSI tag-queue error with
Samsung 870 EVO SSD" as the title.
I'll do that now. Thanks,
This is the first time I've used a solid state drive. It looks like
there's some kind of compatibility or interface problem. The output
from smartctl -x also points to some kind of interface problem.
ZFS counted about 180 write errors while resilvering ~80GB. Most seemed
to be retryable and succeeded on the second try (see logs below).
The system is using AMD-AHCI, not IDE.
The SATA interface is running at 3.0Gb/s, half the SSD's 6.0Gb/s speed. Temperature is fine (~29C).
So far, the errors only occur during heavy activity: write errors during resilvering, and 2 read errors later during a brief burst of read
activity.
[Note: I swapped the SATA cables: that's why ada1 during resilvering
became ada0 later. Since I swapped cables at the drive end, not the motherboard end, I think it unlikely to be a cable/bad connection problem.]
Any ideas what the problem might be? Thanks,
-WBE
----------
[read error log entries:] [mildly edited]
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 ff 64 40 13 00 00 00 00 00
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: ahcich0: Timeout on slot 9 port 0
Aug 21 03:01:25: ahcich0: is 04000000 cs 00000200 ss 00000000 rs 00000200 tfd 451 serr 00400000 cmd 0000e917
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 58 00 65 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Error 5, Unretryable error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c0 36 65 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c8 ff 64 40 13 00 00 00 00 00
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 ()
Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:25 crystal ZFS[1332]: vdev I/O failure, zpool=zp path=/dev/ada0p3 offset=149417648128 size=4096 error=5
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 88 e4 80 40 13 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:26: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 10 48 29 40 05 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 c0 e4 80 40 13 00 00 00 00 00
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain.
[end of read error log entries]
----------
[typical write errors during resilvering:] [mildly edited]
Aug 21 00:33:01: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e0 b2 e7 40 02 00 00 00 00 00
Aug 21 00:33:01: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 00:33:01: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
Aug 21 00:33:02: ahcich1: Timeout on slot 19 port 0
Aug 21 00:33:02: ahcich1: is 04000000 cs 00080000 ss 00000000 rs 00080000 tfd 451 serr 00400000 cmd 0000f317
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 48 f0 b2 e7 40 02 00 00 00 00 00
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
Aug 21 00:33:02: (ada1:ahcich1:0:0:0): Error 5, Unretryable error
Aug 21 00:33:02 crystal ZFS[1322]: vdev I/O failure, zpool=zp path=/dev/ada1p3 offset=7774244864 size=36864 error=5
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 78 2e f4 40 02 00 00 00 00 00
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 58 2e f4 40 02 00 00 00 00 00
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain
[end of write error log entries]
----------
Here's smartctl -x output, keeping only what looked "interesting"/relevant:
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
199 CRC_Error_Count -OSRCK 099 099 000 - 64
235 POR_Recovery_Count -O--C- 099 099 000 - 7
241 Total_LBAs_Written -O--CK 099 099 000 - 213396105
0x06 0x018 4 64 --- Number of Interface CRC Errors
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 2 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 65535+ R_ERR response for non-data FIS
0x0006 2 65535+ R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 5 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 65535+ Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x0013 2 65535+ R_ERR response for host-to-device non-data FIS, non-CRC
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
----------
[END] [Thanks for reading.]
If not alread done install and anable "cpu-microcode" pkg.[...]
Just a try.
With best regards
Matthias Meyser
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 379 |
Nodes: | 16 (2 / 14) |
Uptime: | 43:36:53 |
Calls: | 8,141 |
Calls today: | 4 |
Files: | 13,085 |
Messages: | 5,857,951 |