Usually our TrueNAS fileservers (Really just FreeBSD with a GUI) perform well withwhile qlen grows from 0 or 1 to a dozen or 20 on some disks. CPU utilization stays very low. While this is happening a simple ls command can take 5 minutes. Eventually the problem solves itself.
iostat -x
showing hundreds of megabytes/second read or written with the %b (%busy or %Utilization) at only several percent for each disk. But every few months performance goes to hell, with total throughput only 1 or 2 mbs and %b for group of disks at 99% or 100%
We believe this is because a client is doing a lot of random I/O thatCould also be for error recovery on a couple of blocks.
keeps the heads moving for very little data transfer, and that with all
that seeking none of the other clients get much attention. How do we
locate that job among the many jobs from many users on many nfs clients?
On the client computers we can find out how many bytes are transferred by each process, but that number is small for all jobs - the one doing random I/O doesn't get more bytes than the jobs doing sequential I/O, it just exercises the heads more. We need more information to contact the user
doing random I/O and work with them to do something else.
Alternatively, is there some adjustment of the server that will downgrade
the priority of random access? That user might self-identify if his jobs
took forever to complete.
Daniel Feenberg
NBER
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 294 |
Nodes: | 16 (2 / 14) |
Uptime: | 246:05:37 |
Calls: | 6,626 |
Calls today: | 2 |
Files: | 12,175 |
Messages: | 5,320,576 |