Forum: >>> Magnum BBS <<<

Speed ups for a disk IO bound machine

From Piergiorgio Sartor@21:1/5 to Peter Chant on Sun Dec 11 14:10:30 2016

On 2016-12-11 12:54, Peter Chant wrote:
[...]

Thanks. I've done some reading and there is more to do plus some experimentation.

Experimentation is good.
It is the only way to have so idea of the
different peculiarities.

I understand that mdadm is used to create the raid arrays, it is not
part of lvm itself?

LVM (dmraid) and md share a lot of code, but I'm
not sure about this RAID-10. Maybe it is only md.

[...]

OK. Simple to set up with the SSDs as one is blank and the other has
free space.

Be careful, it is easy to destroy data.
Maybe you can practice with loop devices.
There are some howtos around.

With RAID-10 far 2 am I correct in assuming that the available capacity
for a two device array of 3TB disks would be 3TB (two copies of data)?

Yep.

Part of the RAID-10 with SSD can be used as cache.

OK, so having done some reading up but not carried out any testing I
think the following is a possible setup. Note I am using LUKS so I will
add this extra layer to the mix. I have not noticed any significant performance difference with and without it.

OK, you'll have to decide at which layer LUKS fits.
There are pro and cons for each case.

SSDs:
Create empty partition on larger system SSD.
Add smaller empty SSD.

Create raid 10 far 2 raid array with mdadm for example (need to check
what metadata means!):
mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2 --layout=f2 /dev/sda4 /dev/sdb

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

Use crypt setup to create an encrypted block device on top of this raid
10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2 device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

Create a logical volume on top of the above to use as the cache device.

Yep, if you use lvmcache, maybe bcache can do as well.

HDD:
Similar to the SSD case above except that the logical volume will be for
the slow disks doing the bulk of the work.

OK, I think.

Create a cached device:
Use lvmcache to create a device using the the two logical volumes
created above (bcache would also work).

Seems good to me.

Create a file system on top of the cached device. If using btrfs (what
I do now) use single for data as the raid is occuring a couple of levels down.

Well, it might be btrfs has already RAID-10.
Again, code is shared between this and md too.

Of course, I could have got completely the wrong idea above and invented
some horrid monster!

If I understood it right, it sounds OK to me.

I would, in any case, strongly suggest to experiment,
maybe, as mentioned above, with loop devices.
Not for performances, but for practising possible
combinations and layouts.

Then there is the story of the caching, which has
different scope and performances between bcache
and lvmcache.
In your specific case, I cannot judge which is better,
but lvmcache seems to me easier.

It looks like I could do this without the LVM layer between the LUKs and cache parts. However, this does give me the flexability to create
logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
cached. Or different cache options. I think I could have put another
LVM on top of the cached item I was creating but I think that was over
doing it.

Probably it is.
I would try to use not more than one component at time.
So, 1 md, 1 LUKS, 1 LVM, at maximum.
If possible less.

It would be also possible to create two PVs out of the two
RAID-10 and a single VG with both.
Then the LV can be fitted in one or the other PV.
LUKS can be at RAID level or on top of LV.

This type of "generic" setup has some advantages in case
of hardware updates (easy to add / remove storage devices,
by using pvmove).

bye,

--

piergiorgio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to Roger Blake on Sun Dec 11 22:07:59 2016

On 12/11/2016 01:34 AM, Roger Blake wrote:

On 2016-12-10, Peter Chant <pete@petezilla.co.uk> wrote:

Is there a reason why you gave that one as an example, chipset,
manufacturer or was it simply an inexpensive board with useful specs?

Just an example of a low-cost SATA III board. There are plenty around.
I used one similar to this for my SSD before upgrading to a board that
had built-in SATA III.

Thanks. Looking at the moment it seems there is not much middle ground,
either two ports much cheaper or something much more heavyweight with
matching price tag. But I have two SSDs only and the two port boards
are cheap enough.

It sounds like your bottleneck is the mechanical drives. To significantly speed up you'll need faster drives, or at least a faster type of RAID array.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to Piergiorgio Sartor on Sun Dec 11 22:46:43 2016

On 12/11/2016 01:10 PM, Piergiorgio Sartor wrote:

Be careful, it is easy to destroy data.
Maybe you can practice with loop devices.
There are some howtos around.

Thanks. I'm reasonably careful. However, I'd probally image the device
before trying gparted on it.

With RAID-10 far 2 am I correct in assuming that the available capacity
for a two device array of 3TB disks would be 3TB (two copies of data)?

Yep.

So I have the same space as I have now with btrfs, but with striping and
no loss of redundancy. Cool.

SSDs:
Create empty partition on larger system SSD.
Add smaller empty SSD.

Create raid 10 far 2 raid array with mdadm for example (need to check
what metadata means!):
mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
--layout=f2 /dev/sda4 /dev/sdb

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

I've not looked into that. Perhaps the default is sensible if I don't
pick one. Must do reading.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

I've been using btrfs for a couple of years now, where it is quite
normal not to partition disks. So maybe partition.

Use crypt setup to create an encrypted block device on top of this raid
10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2
device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

I though on top of the raid as otherwise there are four lots of LUKS
with passwords to manage (careful use of keyfiles would undoubtedly
help). Logically ought to be less overhead doing it twice rather than
four times, but that is a guess.

Create a logical volume on top of the above to use as the cache device.

Yep, if you use lvmcache, maybe bcache can do as well.

HDD:
Similar to the SSD case above except that the logical volume will be for
the slow disks doing the bulk of the work.

OK, I think.

Create a cached device:
Use lvmcache to create a device using the the two logical volumes
created above (bcache would also work).

Seems good to me.

Create a file system on top of the cached device. If using btrfs (what
I do now) use single for data as the raid is occuring a couple of levels
down.

Well, it might be btrfs has already RAID-10.
Again, code is shared between this and md too.

Btrfs does indeed support RAID10. However, it is fairly new and does
not appear to support far 2. So mdadm is the way to achieve that.

Of course, I could have got completely the wrong idea above and invented
some horrid monster!

If I understood it right, it sounds OK to me.

Thanks for your help.

I would, in any case, strongly suggest to experiment,
maybe, as mentioned above, with loop devices.
Not for performances, but for practising possible
combinations and layouts.

Any for ensuring I can get a sensible workable system. :-)

Then there is the story of the caching, which has
different scope and performances between bcache
and lvmcache.
In your specific case, I cannot judge which is better,
but lvmcache seems to me easier.

Really I'm not sure. But it seems worthwhile giving a cache system a go.

It looks like I could do this without the LVM layer between the LUKs and
cache parts. However, this does give me the flexability to create
logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
cached. Or different cache options. I think I could have put another
LVM on top of the cached item I was creating but I think that was over
doing it.

Probably it is.
I would try to use not more than one component at time.
So, 1 md, 1 LUKS, 1 LVM, at maximum.
If possible less.

I'd not expect it to work if I set it up in one go.

It would be also possible to create two PVs out of the two
RAID-10 and a single VG with both.
Then the LV can be fitted in one or the other PV.
LUKS can be at RAID level or on top of LV.

If I understand correctly: One mdadm raid10 far 2 from SSDs, one from
HDDs and then add them BOTH to one single VG, and then build the cached
file system from that one volume group? I did not think it worked like
that, as you'd have to control which disks the cache was on.

This type of "generic" setup has some advantages in case
of hardware updates (easy to add / remove storage devices,
by using pvmove).

bye,

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Peter Chant on Mon Dec 12 09:11:10 2016

On 07/12/16 23:41, Peter Chant wrote:

I'm wondering if there are any cheap / easy speed ups for an IO bound machine?

ASUS M4A78 Pro motherboard
Phenom II x6 1090T processor (fastest or second fastest that will
go in this mobo)
8GB RAM

480GB SSD for system files, btrfs, single.
2x 3TB WD red HDD as btrfs RAID1 for data.
Also barely used DVD burner.

I use the on-board graphics.

Slackware 14.1 (in process of 14.2 upgrade) with 4.8.5 kernel.

Using atop etc it seems that the machine is IO bound. Also notice that
if apache / php / mariadb, which are running on the same machine, are churning away and unresponsive I can browse the internet still with
little impact which makes me think it is a local IO thing.

My first thought is to ask what the machine is being used for. It is
not normal to have apache/php/mariadb and firefox/browsing on the same
machine. Usually you have either a server (and therefore no desktop processes), or a desktop (and therefore no server processes of
particular relevance). So what are you doing with the system? What is
it that is using the I/O ? Is the machine really too slow, and what do
you /feel/ is slow on it?

My second thought is that often the best way to improve I/O performance
is to add more ram. Is that a possibility with your hardware?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Piergiorgio Sartor@21:1/5 to Peter Chant on Mon Dec 12 19:22:00 2016

On 2016-12-11 23:46, Peter Chant wrote:
[...]

It would be also possible to create two PVs out of the two
RAID-10 and a single VG with both.
Then the LV can be fitted in one or the other PV.
LUKS can be at RAID level or on top of LV.

If I understand correctly: One mdadm raid10 far 2 from SSDs, one from
HDDs and then add them BOTH to one single VG, and then build the cached
file system from that one volume group? I did not think it worked like
that, as you'd have to control which disks the cache was on.

It seems possible to specify where the LV will
be placed in a multi PV lvm setup.
In case of lvmcache (maybe bcache too, in a certain
way) it should be possible to tell lvm to place the
cache on PV with SSD.

You'll have to check the docs carefully, there are
many "secrets"... :-)

bye,

--

piergiorgio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to David Brown on Mon Dec 12 23:08:07 2016

On 12/12/2016 08:11 AM, David Brown wrote:

My first thought is to ask what the machine is being used for. It is
not normal to have apache/php/mariadb and firefox/browsing on the same machine. Usually you have either a server (and therefore no desktop processes), or a desktop (and therefore no server processes of
particular relevance). So what are you doing with the system? What is
it that is using the I/O ? Is the machine really too slow, and what do
you /feel/ is slow on it?

Home desktop / server. I have mariadb on it and the easierst way for a
front end for some stuff I am doing seemed to be a lamp stack.

Also have mediatomb on it and serve files to mpd on a raspberry pi.
Mediatomb seems to churn the disk sometimes. Not installed right now.

My second thought is that often the best way to improve I/O performance
is to add more ram. Is that a possibility with your hardware?

I've got 8GB and about 40% acts as a disk cache. I could try hunting
down 16GB on ebay but I paid full price a year or so ago for the 8GB.
Looking at kinfocentre right now I have 19% of my memory free, 35% in
disk cache and 44% application data. So would adding more ram do
anything but add to free memory?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Peter Chant on Tue Dec 13 09:29:13 2016

On 13/12/16 00:08, Peter Chant wrote:

On 12/12/2016 08:11 AM, David Brown wrote:

My first thought is to ask what the machine is being used for. It is
not normal to have apache/php/mariadb and firefox/browsing on the same
machine. Usually you have either a server (and therefore no desktop
processes), or a desktop (and therefore no server processes of
particular relevance). So what are you doing with the system? What is
it that is using the I/O ? Is the machine really too slow, and what do
you /feel/ is slow on it?

Home desktop / server. I have mariadb on it and the easierst way for a
front end for some stuff I am doing seemed to be a lamp stack.

Do you mean you are running server an active webserver on the same
system you are using for browsing, development, games, email, etc.?
That is a poor setup, for efficiency, reliability, and security. Of
course, economics can be a factor - but if you can afford to be playing
around with SSDs and multiple disks, then you should also consider if
you should have a separate machine for the server. A small Intel NUC
with a single disk is likely to be good enough for your LAMP stack and mediatomb server, leaving your desktop free to be a desktop.

Also have mediatomb on it and serve files to mpd on a raspberry pi.
Mediatomb seems to churn the disk sometimes. Not installed right now.

My second thought is that often the best way to improve I/O performance
is to add more ram. Is that a possibility with your hardware?

I've got 8GB and about 40% acts as a disk cache. I could try hunting
down 16GB on ebay but I paid full price a year or so ago for the 8GB.
Looking at kinfocentre right now I have 19% of my memory free, 35% in
disk cache and 44% application data. So would adding more ram do
anything but add to free memory?

Yes, adding memory will make a /serious/ difference when you are trying
to work as a server and a desktop - /if/ you really are having
performance issues with I/O. But to be honest, I don't think you /are/
having I/O performance issues - I suspect you just think you are. If
you have a lot of free memory (and 19% is quite a lot, unless you have
just stopped a large process) then you are not actually doing a lot of
I/O, because disk data is cached in ram whenever there is /any/ free ram.

With more ram, writes go faster because they stay in memory for longer
and don't get flushed to disk as often. Reads go faster because it is
much more likely that the data is already in memory. You also have the
option of putting /tmp on tmpfs to speed up processes that use a lot of temporary files.

But again, I would strongly suggest you try to identify what /feels/
slow, and consider how you use the machine. What are you doing in
parallel? What sort of serving are you actually doing, and is it
running in parallel with desktop usage? Why do you think your I/O is slow?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Piergiorgio Sartor on Tue Dec 13 09:57:59 2016

On 10/12/16 23:05, Piergiorgio Sartor wrote:

On 2016-12-10 21:44, Peter Chant wrote:
[...]

Well, I've not taken any formal stats but a good part of the load is
mariadb at the same time as apache / php and firefox. I'd assume random.

This can fit SSD or HDD + cache on SSD.

[...]

I'm a bit confused here. My understanding is that RAID10 requires four
disks, that they are striped in pairs and the pairs then mirror each
other. So I can't do that if I have two drives.

Look at the Linux MD RAID-10 documentation.
You'll see any number, even odd, will do.

Here some reference:

https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

In case of RAID-10, with 2 disks, layout "far 2",
the disks combines RAID-0 and RAID-1.

So, the RAID will survive a failure of 1 disk,
but the read performances are of RAID-0.

In your case, I'll combine the two HDDs in one
RAID-10 and two SSDs in an other RAID-10.
Both with layout "far 2".

There is no point in using "far" layout on SSDs. The benefit of the
"far" layout is for harddisks - it means that most reads will come from
the faster outer cylinders of the disk, and you have less head movement
during reads (but more head movement, and therefore greater latency,
during writes). For SSDs, the layout makes no difference to the timing
and "near" is fine for RAID10.

Note also that if you are using the md layer for the raid, then don't
use raid on the btrfs layer as well.

In general, I would recommend md raid10,far for systems that need high performance streaming of large files. For small file access, it has few benefits over md raid1. And if you have btrfs, then using raid1 in
btrfs rather than md can be significantly more efficient.

Part of the RAID-10 with SSD can be used as cache.

[...]

Hmm. bcache is simpler as a starting point.

bcache is not simpler for anything. It is an extra complicating layer,
and on a system like this it is completely unnecessary.

Well, not really.
If you already use LVM, than lvmcache is easier.

Same goes for lvmcache in this case - it is overkill.

Because you can add and remove the cache to the
LVM volume on the fly.
With bcache, you'll have to start with it from
the beginning.

bye,

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Piergiorgio Sartor on Tue Dec 13 10:17:46 2016

On 11/12/16 14:10, Piergiorgio Sartor wrote:

On 2016-12-11 12:54, Peter Chant wrote:
[...]

Thanks. I've done some reading and there is more to do plus some
experimentation.

Experimentation is good.
It is the only way to have so idea of the
different peculiarities.

I understand that mdadm is used to create the raid arrays, it is not
part of lvm itself?

LVM (dmraid) and md share a lot of code, but I'm
not sure about this RAID-10. Maybe it is only md.

lvm can do raid1 and raid0, and can be layered to give a traditional
4-drive raid10. But it cannot do md-style raid10, nor the other more
advanced raid setups. It is also less efficient than md raid - but
sometimes more convenient because it is much easier to have the raid
settings change on the fly (such as if you try to make a raid0 stripe
from physical volumes of different sizes - lvm will stripe as much as
possible, and give you unstriped use of the rest of the space).

[...]

OK. Simple to set up with the SSDs as one is blank and the other has
free space.

Be careful, it is easy to destroy data.
Maybe you can practice with loop devices.
There are some howtos around.

With RAID-10 far 2 am I correct in assuming that the available capacity
for a two device array of 3TB disks would be 3TB (two copies of data)?

Yep.

Part of the RAID-10 with SSD can be used as cache.

OK, so having done some reading up but not carried out any testing I
think the following is a possible setup. Note I am using LUKS so I will
add this extra layer to the mix. I have not noticed any significant
performance difference with and without it.

This makes me even more suspicious that you (the OP) really have an I/O problem, or have identified where it is.

OK, you'll have to decide at which layer LUKS fits.
There are pro and cons for each case.

SSDs:
Create empty partition on larger system SSD.
Add smaller empty SSD.

Create raid 10 far 2 raid array with mdadm for example (need to check
what metadata means!):
mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
--layout=f2 /dev/sda4 /dev/sdb

That looks like you are raiding a partition on one device with the
entire second device. Are you sure that is what you want?

I don't think anything has been said about the sizes and partitioning of
the SSD. For smaller or cheaper SSDs, it is worth leaving a small
amount of unpartitioned space at the end of the disk to give it more flexibility in garbage collection. (Do a secure erase before
partitioning if it is not a new clean SSD.)

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

Use crypt setup to create an encrypted block device on top of this raid
10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2
device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

Neither is better, IMHO, unless you have some reason to be seriously
paranoid. It is understandable why one would want to encrypt a portable machine that you use a lot while travelling, but a home desktop? Think
about whether encryption here is really a useful thing - adding layers
of complexity does not make anything faster, and it makes it a whole lot
more difficult if something goes wrong or if you want to recover your
files from a different system.

Create a logical volume on top of the above to use as the cache device.

Yep, if you use lvmcache, maybe bcache can do as well.

Again, that's unnecessary complexity for a system like this. The
benefits of lvmcache and bcache are debatable even for loads that match
them.

HDD:
Similar to the SSD case above except that the logical volume will be for
the slow disks doing the bulk of the work.

OK, I think.

Create a cached device:
Use lvmcache to create a device using the the two logical volumes
created above (bcache would also work).

Seems good to me.

Create a file system on top of the cached device. If using btrfs (what
I do now) use single for data as the raid is occuring a couple of levels
down.

Well, it might be btrfs has already RAID-10.
Again, code is shared between this and md too.

btrfs does not have raid10, and it does not share significant raid1 or
raid0 code with md. (It /does/ share code for calculating raid5 and
raid6 parities, but that's another issue - and don't use btrfs raid5/6
until the bugs are sorted out!).

You only want the raid1 at one level. Your choice is raid1 on btrfs for
best performance and efficiency (since only the actual useful data is
mirrored, rather than the entire raw disk), or raid10,far on the md
layer (for greater large file streaming read speed). This can be a big
issue with SSDs - with btrfs raid1 you avoid initially copying over an
entire diskful of data from one device to the other.

Of course, I could have got completely the wrong idea above and invented
some horrid monster!

If I understood it right, it sounds OK to me.

I would, in any case, strongly suggest to experiment,
maybe, as mentioned above, with loop devices.

Agreed.

Not for performances, but for practising possible
combinations and layouts.

Agreed.

Then there is the story of the caching, which has
different scope and performances between bcache
and lvmcache.
In your specific case, I cannot judge which is better,
but lvmcache seems to me easier.

It looks like I could do this without the LVM layer between the LUKs and
cache parts. However, this does give me the flexability to create
logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
cached. Or different cache options. I think I could have put another
LVM on top of the cached item I was creating but I think that was over
doing it.

Probably it is.
I would try to use not more than one component at time.
So, 1 md, 1 LUKS, 1 LVM, at maximum.
If possible less.

It would be also possible to create two PVs out of the two
RAID-10 and a single VG with both.
Then the LV can be fitted in one or the other PV.
LUKS can be at RAID level or on top of LV.

This type of "generic" setup has some advantages in case
of hardware updates (easy to add / remove storage devices,
by using pvmove).

bye,

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to David Brown on Wed Dec 14 20:03:04 2016

On 12/13/2016 09:17 AM, David Brown wrote:

OK, so having done some reading up but not carried out any testing I
think the following is a possible setup. Note I am using LUKS so I will >>> add this extra layer to the mix. I have not noticed any significant
performance difference with and without it.

This makes me even more suspicious that you (the OP) really have an I/O problem, or have identified where it is.

I've had periods where the disks have been solidly at 80-90% utalisation
for many seconds yet the CPU has been lightly loaded.

Create raid 10 far 2 raid array with mdadm for example (need to check
what metadata means!):
mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
--layout=f2 /dev/sda4 /dev/sdb

That looks like you are raiding a partition on one device with the
entire second device. Are you sure that is what you want?

Well. If raiding SSD's there is some logic to this horrible looking
asymetric setup. I'm using an SSD for the OS and that has plenty of
free space. I've also the older smaller SSD it replaced. Although it
looks messy I could free up some space on the current system SSD to use
a partition for RAID and use that in combination with the old, currently
unused SSD. That saves shelling out for another SSD if I want to RAID a
pair. Anaesthetically it is horrid and obviously would impact the speed
of OS access. But is it hardware I have so the monetary cost is no
issue. The time and hair loss cost might not be so trivial.

Would image the SSD in case of mess ups before resizing partitions.

As for the whole of /dev/sdb - I've been using btrfs for a while and it
is normal to give it whole disks, a simple slip of the finger.

I don't think anything has been said about the sizes and partitioning of
the SSD. For smaller or cheaper SSDs, it is worth leaving a small
amount of unpartitioned space at the end of the disk to give it more flexibility in garbage collection. (Do a secure erase before
partitioning if it is not a new clean SSD.)

Have done that already.

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

Use crypt setup to create an encrypted block device on top of this raid
10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2
device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

Neither is better, IMHO, unless you have some reason to be seriously paranoid. It is understandable why one would want to encrypt a portable machine that you use a lot while travelling, but a home desktop? Think
about whether encryption here is really a useful thing - adding layers
of complexity does not make anything faster, and it makes it a whole lot
more difficult if something goes wrong or if you want to recover your
files from a different system.

Well, in this day and age it seemed like a reasonable idea.

Create a logical volume on top of the above to use as the cache device.

Yep, if you use lvmcache, maybe bcache can do as well.

Again, that's unnecessary complexity for a system like this. The
benefits of lvmcache and bcache are debatable even for loads that match
them.

So it is about as fast as it will get now?

I've a spare hdd and ssd so I'll have a play when I get time. Need to
think about a useful benchmark.

btrfs does not have raid10, and it does not share significant raid1 or
raid0 code with md. (It /does/ share code for calculating raid5 and
raid6 parities, but that's another issue - and don't use btrfs raid5/6
until the bugs are sorted out!).

Oh yes it does. :-). But I don't see anything about far 2.

You only want the raid1 at one level. Your choice is raid1 on btrfs for
best performance and efficiency (since only the actual useful data is mirrored, rather than the entire raw disk), or raid10,far on the md
layer (for greater large file streaming read speed). This can be a big
issue with SSDs - with btrfs raid1 you avoid initially copying over an
entire diskful of data from one device to the other.

replacing a disk and the associated balance took a week.

I'm at RAID1 with btrfs now. Yes, not RAIDing btrfs over another RAID
as that make little sense here.

Of course, I could have got completely the wrong idea above and invented >>> some horrid monster!

If I understood it right, it sounds OK to me.

I would, in any case, strongly suggest to experiment,
maybe, as mentioned above, with loop devices.

Agreed.

Spare hdd and ssd. Could use loop devices but perhaps real hw is useful,
though not enough to RAID.

Not for performances, but for practising possible

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to David Brown on Wed Dec 14 23:16:40 2016

On 12/13/2016 08:29 AM, David Brown wrote:

On 13/12/16 00:08, Peter Chant wrote:

On 12/12/2016 08:11 AM, David Brown wrote:

My first thought is to ask what the machine is being used for. It is
not normal to have apache/php/mariadb and firefox/browsing on the same
machine. Usually you have either a server (and therefore no desktop
processes), or a desktop (and therefore no server processes of
particular relevance). So what are you doing with the system? What is
it that is using the I/O ? Is the machine really too slow, and what do
you /feel/ is slow on it?

Home desktop / server. I have mariadb on it and the easierst way for a
front end for some stuff I am doing seemed to be a lamp stack.

Do you mean you are running server an active webserver on the same
system you are using for browsing, development, games, email, etc.?
That is a poor setup, for efficiency, reliability, and security. Of
course, economics can be a factor - but if you can afford to be playing around with SSDs and multiple disks, then you should also consider if
you should have a separate machine for the server. A small Intel NUC
with a single disk is likely to be good enough for your LAMP stack and mediatomb server, leaving your desktop free to be a desktop.

I think my lamp stack is somewhat atypical. I'm storing numerical data
in it and doing some calcs on that. Might be good for storage but calcs
in python and php are not optimal. However, this is partly historic and
partly convenience and rework would be a major pita.

However, if I get slowdowns on this fairly elderly machine and a six
core cpu and 8GB of ram then with albeit a newer generation CPU I don't
see a NUC being much faster, though I admit I've never got my ands on
one. Plus there is not room for the two hdds plus the ssd OS disk.
Using this machine as the server and the nuc as the desktop would make
more sense.

I did think about this in the past, or getting a nice laptop / docking
station combination and a server setup.

The system is fairly responsive right now but I am not hitting the HDDs
right now using thunderbird as I type this. This is with duperemove
hitting the HDDs hard and I'd not consider doing anything else that hit
the HDDs. Incidentally application data is now 72% of physical memory
and disk cache 24-25% with the remainder few % free.

Also have mediatomb on it and serve files to mpd on a raspberry pi.
Mediatomb seems to churn the disk sometimes. Not installed right now.

My second thought is that often the best way to improve I/O performance
is to add more ram. Is that a possibility with your hardware?

I've got 8GB and about 40% acts as a disk cache. I could try hunting
down 16GB on ebay but I paid full price a year or so ago for the 8GB.
Looking at kinfocentre right now I have 19% of my memory free, 35% in
disk cache and 44% application data. So would adding more ram do
anything but add to free memory?

Yes, adding memory will make a /serious/ difference when you are trying
to work as a server and a desktop - /if/ you really are having
performance issues with I/O. But to be honest, I don't think you /are/ having I/O performance issues - I suspect you just think you are. If
you have a lot of free memory (and 19% is quite a lot, unless you have
just stopped a large process) then you are not actually doing a lot of
I/O, because disk data is cached in ram whenever there is /any/ free ram.

Given that there is little free now the numbers I quoted earlier might
not have been representative. I did not see a noticeable difference
between 4 & 8 GB therefore I'd not considered more ram. However, if it
is really likely to make a big difference and with 16GB of DDR2 going
for between £15 and £45 on ebay then some research is warranted.

With more ram, writes go faster because they stay in memory for longer
and don't get flushed to disk as often. Reads go faster because it is
much more likely that the data is already in memory. You also have the option of putting /tmp on tmpfs to speed up processes that use a lot of temporary files.

I've put /tmp on tmpfs before. I have /dev/shm on tmpfs at the moment
as part of slackware's default config. Generally I've abandoned this
when compiling packages filled up /tmp and I ran out of tmp space.
Generally a failure to clean up /tmp, but some packages are large and
have a lot of dependencies.

But again, I would strongly suggest you try to identify what /feels/
slow, and consider how you use the machine. What are you doing in
parallel? What sort of serving are you actually doing, and is it
running in parallel with desktop usage? Why do you think your I/O is slow?

I used the term 'IO bound' as I've seen the HDDs hit 80-90% for long
periods yet CPU usage has been relatively low. So to me IO was the
limiting factor. Going out and spending lots of cash (not cache!) on
the latest i7 + motherboard and memory therefore I assume would not
improve the user experience whereas speeding up the existing IO would,
if possible.

The lamp load above is likely excessive but I have seen slowdowns before
with this machine. Sometimes btrfs seems to build up a backlog of stuff
to do (btrfs cleaner, transactions etc) for a while after doing
something disk intensive. But I've noticed this less lately. Btrfs has
not got a reputation for being slow although odd and specific cases do
show up on the mailing list from time to time. I'm not planning on
swapping file systems unless to another with subvolumes and probably
snapshots as subvolumes have let me organise things in a much more
logical and efficient manner since I have started using them.

I have a nagging feeling that something just is not right. However, I
need to benchmark. I also have had a cheap two interface SATAIII card.
If there is something odd with the disk interface (can't see what) maybe
that will shake it out. It should allow the SSD to function to its
potential anyway, so it is not a bad idea.

Unfortunately the slightly higher range 4 port SATA III PCIe x2 cards
seem limited right now, I'd have to go up quite a notch in price to
eight port / SAS cards and I'm starting to through reasonable sums of
money at an elderly mobo / processor / ram combination with no assured
outcome. However, cheap improvements and especially improvements with
existing kit are definitely work pursuing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Peter Chant on Thu Dec 15 13:15:32 2016

On 15/12/16 00:16, Peter Chant wrote:

On 12/13/2016 08:29 AM, David Brown wrote:

On 13/12/16 00:08, Peter Chant wrote:

On 12/12/2016 08:11 AM, David Brown wrote:

My first thought is to ask what the machine is being used for. It is
not normal to have apache/php/mariadb and firefox/browsing on the same >>>> machine. Usually you have either a server (and therefore no desktop
processes), or a desktop (and therefore no server processes of
particular relevance). So what are you doing with the system? What is >>>> it that is using the I/O ? Is the machine really too slow, and what do >>>> you /feel/ is slow on it?

Home desktop / server. I have mariadb on it and the easierst way for a
front end for some stuff I am doing seemed to be a lamp stack.

Do you mean you are running server an active webserver on the same
system you are using for browsing, development, games, email, etc.?
That is a poor setup, for efficiency, reliability, and security. Of
course, economics can be a factor - but if you can afford to be playing
around with SSDs and multiple disks, then you should also consider if
you should have a separate machine for the server. A small Intel NUC
with a single disk is likely to be good enough for your LAMP stack and
mediatomb server, leaving your desktop free to be a desktop.

I think my lamp stack is somewhat atypical. I'm storing numerical data
in it and doing some calcs on that. Might be good for storage but calcs
in python and php are not optimal. However, this is partly historic and partly convenience and rework would be a major pita.

For Python, you can look at numpy and scipy for serious calculations -
if you can work with your data as homogeneous arrays then numpy will do
the calculations in fast C libraries rather than interpreted Python.
Also look at pypy or psyco as ways to speed up Python code. You may
also find that if you have heavy Python pages you are better using
Twisted as a webserver rather than Apache so that you work entirely
within the one Python process rather than starting and stopping
processes all the time.

However, if I get slowdowns on this fairly elderly machine and a six
core cpu and 8GB of ram then with albeit a newer generation CPU I don't
see a NUC being much faster, though I admit I've never got my ands on
one. Plus there is not room for the two hdds plus the ssd OS disk.

A NUC will give you as good I/O disk throughput (or better - I think you
only had SATA-2 on your machine?). The processor may or may not be
better - there are far too many NUC variants to keep track of!

But the point is to separate the different types of usage.

Using this machine as the server and the nuc as the desktop would make
more sense.

Maybe that's the way to do it.

I did think about this in the past, or getting a nice laptop / docking station combination and a server setup.

The system is fairly responsive right now but I am not hitting the HDDs
right now using thunderbird as I type this. This is with duperemove
hitting the HDDs hard and I'd not consider doing anything else that hit
the HDDs. Incidentally application data is now 72% of physical memory
and disk cache 24-25% with the remainder few % free.

Also have mediatomb on it and serve files to mpd on a raspberry pi.
Mediatomb seems to churn the disk sometimes. Not installed right now.

My second thought is that often the best way to improve I/O performance >>>> is to add more ram. Is that a possibility with your hardware?

I've got 8GB and about 40% acts as a disk cache. I could try hunting
down 16GB on ebay but I paid full price a year or so ago for the 8GB.
Looking at kinfocentre right now I have 19% of my memory free, 35% in
disk cache and 44% application data. So would adding more ram do
anything but add to free memory?

Yes, adding memory will make a /serious/ difference when you are trying
to work as a server and a desktop - /if/ you really are having
performance issues with I/O. But to be honest, I don't think you /are/
having I/O performance issues - I suspect you just think you are. If
you have a lot of free memory (and 19% is quite a lot, unless you have
just stopped a large process) then you are not actually doing a lot of
I/O, because disk data is cached in ram whenever there is /any/ free ram.

Given that there is little free now the numbers I quoted earlier might
not have been representative. I did not see a noticeable difference
between 4 & 8 GB therefore I'd not considered more ram. However, if it
is really likely to make a big difference and with 16GB of DDR2 going
for between £15 and £45 on ebay then some research is warranted.

I have seen extra ram make an impressive difference to speed. Not long
ago a fellow developer here thought he needed a new graphics card at
about £500 because his current £300 one was too slow for the 3D
rendering he was doing. But £30 more ram doubled the speed of the
system, while a new graphics card would have made little difference.

With more ram, writes go faster because they stay in memory for longer
and don't get flushed to disk as often. Reads go faster because it is
much more likely that the data is already in memory. You also have the
option of putting /tmp on tmpfs to speed up processes that use a lot of
temporary files.

I've put /tmp on tmpfs before. I have /dev/shm on tmpfs at the moment
as part of slackware's default config. Generally I've abandoned this
when compiling packages filled up /tmp and I ran out of tmp space.
Generally a failure to clean up /tmp, but some packages are large and
have a lot of dependencies.

/dev/shm is always on tmpfs (in modern systems, anyway). Processes use
it specifically as a convenient way to have a block of memory shared
between them - the create a file on the tmpfs while remains in memory,
then mmap it to access the memory directly.

It is often more efficient to have /tmp on tmpfs and let it spill out
into swap, than to have the /tmp directly on the disk.

But again, I would strongly suggest you try to identify what /feels/
slow, and consider how you use the machine. What are you doing in
parallel? What sort of serving are you actually doing, and is it
running in parallel with desktop usage? Why do you think your I/O is slow? >>

I used the term 'IO bound' as I've seen the HDDs hit 80-90% for long
periods yet CPU usage has been relatively low. So to me IO was the
limiting factor. Going out and spending lots of cash (not cache!) on
the latest i7 + motherboard and memory therefore I assume would not
improve the user experience whereas speeding up the existing IO would,
if possible.

The lamp load above is likely excessive but I have seen slowdowns before
with this machine. Sometimes btrfs seems to build up a backlog of stuff
to do (btrfs cleaner, transactions etc) for a while after doing
something disk intensive. But I've noticed this less lately. Btrfs has
not got a reputation for being slow although odd and specific cases do
show up on the mailing list from time to time. I'm not planning on
swapping file systems unless to another with subvolumes and probably snapshots as subvolumes have let me organise things in a much more
logical and efficient manner since I have started using them.

Agreed. Btrfs is not perfect, but I find it the best choice for my
usage. Cheap snapshots are really nice!

I have a nagging feeling that something just is not right. However, I
need to benchmark. I also have had a cheap two interface SATAIII card.
If there is something odd with the disk interface (can't see what) maybe
that will shake it out. It should allow the SSD to function to its
potential anyway, so it is not a bad idea.

hdparm can give you a simple test of the buffer bandwidth, and smartctrl
can be useful to list the features of the interface and device to check
for obvious missing points.

A newer cpu and motherboard may not seem useful from the viewpoint of
processor power, but they will have better throughput on the I/O and
faster native SATA.

Unfortunately the slightly higher range 4 port SATA III PCIe x2 cards
seem limited right now, I'd have to go up quite a notch in price to
eight port / SAS cards and I'm starting to through reasonable sums of
money at an elderly mobo / processor / ram combination with no assured outcome. However, cheap improvements and especially improvements with existing kit are definitely work pursuing.

I'd avoid SAS - but that is because I have been bitten by it. There is
not much that SAS does better than SATA (queued trim commands is one
thing), and the disks cost twice as much. I prefer just to get twice as
many SATA disks for the same money. I had one server that had SAS
because the salesman convinced me that the extra reliability of SAS
drivers over SATA drivers was worth the cost. That machine broke when
the SAS controller card died, leaving me with a useless server and a
disk that I could not read because everything else was SATA. Sometimes
you learn by doing!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Peter Chant on Thu Dec 15 12:55:35 2016

On 14/12/16 21:03, Peter Chant wrote:

On 12/13/2016 09:17 AM, David Brown wrote:

OK, so having done some reading up but not carried out any testing I
think the following is a possible setup. Note I am using LUKS so I will >>>> add this extra layer to the mix. I have not noticed any significant
performance difference with and without it.

This makes me even more suspicious that you (the OP) really have an I/O
problem, or have identified where it is.

I've had periods where the disks have been solidly at 80-90% utalisation
for many seconds yet the CPU has been lightly loaded.

What is the computer doing at the time?

Is this actually slowing down something that you are waiting for?

It is /normal/ for a computer to run at maximum in some aspects, for
some time. If you are transferring a large file, you /want/ the disks
to be as close to 100% as possible. If you are doing a raid1 scrub, you
/want/ the disks to be close to 100%, perhaps for hours at a time if the
disks are big (but at low I/O priority so that other tasks can also run).

So far, you have just told me that your system is working.

Create raid 10 far 2 raid array with mdadm for example (need to check
what metadata means!):
mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2 >>>> --layout=f2 /dev/sda4 /dev/sdb

That looks like you are raiding a partition on one device with the
entire second device. Are you sure that is what you want?

Well. If raiding SSD's there is some logic to this horrible looking asymetric setup. I'm using an SSD for the OS and that has plenty of
free space. I've also the older smaller SSD it replaced. Although it
looks messy I could free up some space on the current system SSD to use
a partition for RAID and use that in combination with the old, currently unused SSD. That saves shelling out for another SSD if I want to RAID a pair. Anaesthetically it is horrid and obviously would impact the speed
of OS access. But is it hardware I have so the monetary cost is no
issue. The time and hair loss cost might not be so trivial.

OK - my first thought was that it was a mistake (possibly just a missing character from a cut-and-paste). Now I see it is intentional.

I am not convinced this will be a good thing - in fact, I am confident
that it is a /bad/ thing. An old small SSD can easily be a /lot/ slower
than a new one - typically, old and small SSDs have poor garbage
collection and little over-provisioning. This means they get very slow
at writes when they are full - even slower, sometimes, than hard disks.
When you create an md raid1 array like this, the first thing md will do
is copy block-for-block from /dev/sda4 to /dev/sdb. It will write to
the entire disk - the old SSD will think that /all/ its normal blocks
are full of important data. As soon as you try to write something else
to it, it must now try to do garbage collection in "panic" mode with
minimal free space - you might find your write latencies measured in
/seconds/. And if the SSD is old enough, then it won't be able to
handle reads during the erases and garbage collection. If you are
lucky, reads will come from the other disk. If you are unlucky, reads
will be stalled too.

So I would expect your system to be a good deal slower by doing this,
compared to simply using the new SSD on its own.

/If/ your old SSD is reasonably fast, you can first run a secure erase
to tell it to drop all data. Then partition it, leaving a little extra
space unused - this overprovisioning can help a lot. And use it with
btrfs raid1, not md raid1, so that only the actual useful data is
replicated.

Would image the SSD in case of mess ups before resizing partitions.

As for the whole of /dev/sdb - I've been using btrfs for a while and it
is normal to give it whole disks, a simple slip of the finger.

I always use partitions, but I usually want a couple of partitions for
other things (like swap).

I don't think anything has been said about the sizes and partitioning of
the SSD. For smaller or cheaper SSDs, it is worth leaving a small
amount of unpartitioned space at the end of the disk to give it more
flexibility in garbage collection. (Do a secure erase before
partitioning if it is not a new clean SSD.)

Have done that already.

Including the unused space at the end?

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

Use crypt setup to create an encrypted block device on top of this raid >>>> 10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2 >>>> device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

Neither is better, IMHO, unless you have some reason to be seriously
paranoid. It is understandable why one would want to encrypt a portable
machine that you use a lot while travelling, but a home desktop? Think
about whether encryption here is really a useful thing - adding layers
of complexity does not make anything faster, and it makes it a whole lot
more difficult if something goes wrong or if you want to recover your
files from a different system.

Well, in this day and age it seemed like a reasonable idea.

It can be fun to play around with this sort of thing, but that doesn't
mean it is a good idea if you are aiming for a useful system.

One particular thing that can be a serious pain with such complex setups
is if something goes wrong. If something breaks badly, you might have
to put the disks in a different machine to recover the data, or boot the
same machine from a live USB. If you have a lvm cache or bcache in
writeback mode and you have both the SSD and the HD online, and you may
need a system with the right kernel and utilities in order to get the
disks working. Even then, it's easy to accidentally corrupt things
along the way such as by writing to the HD while there are uncommitted
changes on the SSD part. I have enough experience with complicated
recoveries to know that when something goes wrong, you'll be glad you
kept things simple - and that you documented everything :-)

Create a logical volume on top of the above to use as the cache device. >>>

Yep, if you use lvmcache, maybe bcache can do as well.

Again, that's unnecessary complexity for a system like this. The
benefits of lvmcache and bcache are debatable even for loads that match
them.

So it is about as fast as it will get now?

I've a spare hdd and ssd so I'll have a play when I get time. Need to
think about a useful benchmark.

It's all about the type of load you are using. lvmcache and/or bcache
can help a lot in some cases - but be no faster than a single HD in
other cases. And for many of the things that /do/ run faster with the
caches, you could just as easily run them entirely from the SSD
significantly faster - or get even better results by adding more RAM.

The key to getting /really/ optimal systems is, as you say, useful
benchmarks. There is no benefit in looking at the timings of a test
that writes lots of small blocks from lots of threads unless that really
is what you are doing in practice. There is no benefit in running a
benchmark after clearing your caches, because you don't clear your
caches in practice - but if you run without clearing the caches, you are
not testing your disk performance. The only "true" benchmark is to run
your system with your typical real tasks and see how it performs.

btrfs does not have raid10, and it does not share significant raid1 or
raid0 code with md. (It /does/ share code for calculating raid5 and
raid6 parities, but that's another issue - and don't use btrfs raid5/6
until the bugs are sorted out!).

Oh yes it does. :-). But I don't see anything about far 2.

Btrfs "raid10" is (roughly) traditional raid1+0, where you have two
mirrors which are striped. (I say roughly, because btrfs handles this
at the level of extents, rather than raw disk blocks.) Linux md raid10
is quite a bit more sophisticated than just layered raid1 and raid0 -
which is why it works on any number of disks, not just multiples of 4.

You only want the raid1 at one level. Your choice is raid1 on btrfs for
best performance and efficiency (since only the actual useful data is
mirrored, rather than the entire raw disk), or raid10,far on the md
layer (for greater large file streaming read speed). This can be a big
issue with SSDs - with btrfs raid1 you avoid initially copying over an
entire diskful of data from one device to the other.

replacing a disk and the associated balance took a week.

If you have a lot of data, it takes a while to copy it all over. And a re-balance actually does a good deal more work than just a plain copy.
But at least it doesn't copy the unused space too.

I'm at RAID1 with btrfs now. Yes, not RAIDing btrfs over another RAID
as that make little sense here.

Of course, I could have got completely the wrong idea above and invented >>>> some horrid monster!

If I understood it right, it sounds OK to me.

I would, in any case, strongly suggest to experiment,
maybe, as mentioned above, with loop devices.

Agreed.

Spare hdd and ssd. Could use loop devices but perhaps real hw is useful, though not enough to RAID.

Loop devices work well in testing, especially for seeing how to add
disks, replace disks, resize things, etc. Of course they are of little
help in speed testing.

I usually testing using loop devices made in a tmpfs mount - but then,
my machines normally have lots of ram.

Not for performances, but for practising possible

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to David Brown on Thu Dec 15 22:50:48 2016

On 12/15/2016 12:15 PM, David Brown wrote:

I think my lamp stack is somewhat atypical. I'm storing numerical data
in it and doing some calcs on that. Might be good for storage but calcs
in python and php are not optimal. However, this is partly historic and
partly convenience and rework would be a major pita.

For Python, you can look at numpy and scipy for serious calculations -
if you can work with your data as homogeneous arrays then numpy will do
the calculations in fast C libraries rather than interpreted Python.
Also look at pypy or psyco as ways to speed up Python code. You may
also find that if you have heavy Python pages you are better using
Twisted as a webserver rather than Apache so that you work entirely
within the one Python process rather than starting and stopping
processes all the time.

At this stage the rewrite is probally not worthwhile. I'm not sure this
bit is necessarily the bottleneck anyway. Investigation required.

Given that there is little free now the numbers I quoted earlier might
not have been representative. I did not see a noticeable difference
between 4 & 8 GB therefore I'd not considered more ram. However, if it
is really likely to make a big difference and with 16GB of DDR2 going
for between £15 and £45 on ebay then some research is warranted.

I have seen extra ram make an impressive difference to speed. Not long
ago a fellow developer here thought he needed a new graphics card at
about £500 because his current £300 one was too slow for the 3D
rendering he was doing. But £30 more ram doubled the speed of the
system, while a new graphics card would have made little difference.

The lack of noticable difference put me off. But a little careful
ebaying might bump me to 16 for a reasonable price.

It is often more efficient to have /tmp on tmpfs and let it spill out
into swap, than to have the /tmp directly on the disk.

That is good to know for the odd occasion. Hmm. I've just noticed a
tiny swap hit with 7% free physical memory as I write this. First ever,
that I have noticed!

A newer cpu and motherboard may not seem useful from the viewpoint of processor power, but they will have better throughput on the I/O and
faster native SATA.

I've a cheap as chips PCIe SATA 3 controller waiting to go in. I'm
curious as to weather it would make a difference to the HDDs, it should
not. Anyway, it should help the SSD or SSDs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Peter Chant on Fri Dec 16 09:39:40 2016

On 15/12/16 23:50, Peter Chant wrote:

On 12/15/2016 12:15 PM, David Brown wrote:

I think my lamp stack is somewhat atypical. I'm storing numerical data
in it and doing some calcs on that. Might be good for storage but calcs >>> in python and php are not optimal. However, this is partly historic and >>> partly convenience and rework would be a major pita.

For Python, you can look at numpy and scipy for serious calculations -
if you can work with your data as homogeneous arrays then numpy will do
the calculations in fast C libraries rather than interpreted Python.
Also look at pypy or psyco as ways to speed up Python code. You may
also find that if you have heavy Python pages you are better using
Twisted as a webserver rather than Apache so that you work entirely
within the one Python process rather than starting and stopping
processes all the time.

At this stage the rewrite is probally not worthwhile. I'm not sure this
bit is necessarily the bottleneck anyway. Investigation required.

If you are using 32-bit Python 2, psyco can be a great solution.
Install it, and then add this to your Python script:

import psyco
psyco.full()

That's all you need, and psyco will do JIT compilation of functions that
are run often.

As a project, psyco is now "dead" - PyPy is the successor for newer
Python systems. But it can be a very easy way to speed up some kinds of
code quite significantly.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to All on Fri Dec 16 09:34:18 2016

On 16/12/16 00:18, Peter Chant wrote:
<snip>

I am not going to go through this answering point for point - it would
be too specific and require too many questions and answers about exact
details. I'll make a few more comments below - and I have read the bits
that I have snipped, even if I didn't reply to them. If you want to
take any of these points further, that's fine - but we can snip the
other parts to keep the posts of manageable length.

It is good that you are willing to experiment and try new arrangements
here - getting an "optimal" system is not easy. And of course, playing
around and learning about the options is fun! I hope I have been able
to give you a few ideas about some possibilities you were not aware of,
some pointers about things you need to consider, and some technical
details to help you understand some of the options.

And hopefully at least some of it has been interesting to other people
in the group!

If you are looking for more help or information about raid, the Linux
raid mailing list is a good place. It has an emphasis on md raid, but
covers other types of raid too.

On 12/15/2016 11:55 AM, David Brown wrote:

On 14/12/16 21:03, Peter Chant wrote:

On 12/13/2016 09:17 AM, David Brown wrote:

<snip>

Well, the old SSD is not that old its a Samsung 840 Pro. Anyway. I have
it and am not otherwise using it and the same goes for a 2TB HDD. So I
can play with caching (but not necessarily RAID). So I can add them to
the machine and have a go. However, this is much larger than the
mariadb files and some others so I can try using it as a dedicated drive
for some things and see if shifting some storage to it makes a
difference. With more effort I can probably play with RAID and caching.

Running directly from SSD is always faster and more efficient than any
sort of caching system. So putting the files on the SSD is a good idea.

Here are a few other pointers to consider and investigate:

1. Mariadb/MySql supports a number of different table formats. Are the
formats you are using the best for the job?

2. Some formats need update-in-place files for efficiency. Btrfs uses copy-on-write, which can be slow and inefficient if the application
expects to change parts of a file.

3. RAID0 only helps speed if you have large files. For small files and
random access, RAID0 slows things down - especially on hard disks.
RAID1 can speed up multiple small reads in parallel, but slows down
small writes a bit.

4. You can choose where to store your database files, regardless of how
the rest of your system is organised. One simple way is to use symbolic
links for directories. So if your database files are in /var/mariadb,
and you want them to be on a disk that is mounted /mnt/disk2, make
/var/mariadb a symbolic link pointing to /mnt/disk2/mariadb.

5. Experiment with the database files on a tmpfs. This will give you an
idea of the maximum speeds possible, so that you know you are not aiming unrealistically high or dealing with something completely different as
the bottleneck in the system.

I've also looked up mysql optimisation and xfs is suggested. So I can
make a dedicated xfs partition and see what happens. With or without LVM...

xfs makes sense on /big/ systems. It makes little sense on small
severs. The usual disk setup for a high performance xfs system is to
organise your disks in raid1 pairs, then make a linear stripe (/not/ a
raid0) on top of that for xfs. The aim is to give low-latency parallel
access from a large number of processes - think media bank, large file
server, mail server, big database server, etc. If you don't have at
least 4 or 6 identical drives, your server task is probably not big
enough to benefit much from xfs.

There seem to be two schools of thought on RAID1 here. Anyway, I can
play as far as I like provided I don't waste time or money.

You are going to be wasting time in this - but it is all experience and
part of the learning process.

<snip>

/If/ your old SSD is reasonably fast, you can first run a secure erase
to tell it to drop all data. Then partition it, leaving a little extra
space unused - this overprovisioning can help a lot. And use it with
btrfs raid1, not md raid1, so that only the actual useful data is
replicated.

Think I'll boot on a USB stick and unplug all other drives when trying that.

<snip>

The system is on a simple partition with btrfs in single. But boot is
ext3 or 4. And there is a partition I consider the 'maintenance'
partition with a full slackware install on ext3 or 4 that I hardly touch
but is handy in case I break something. Also you can't build a kernel
from a rescue disk but you can with this which is nice.

You can reasonably argue that it is a waste of space especially on an
SSD which is expensive compared to a HDD but it is a nice comfort
blanket and doubly so on the rare occasions it is needed.

Agreed.

<snip>

If you have a lot of data, it takes a while to copy it all over. And a
re-balance actually does a good deal more work than just a plain copy.
But at least it doesn't copy the unused space too.

Ues, there is the checksumming for a start. But that is a good thing.

The checksumming is not very processor intensive. But it is not
particularly useful either - so-called "bit rot" and "undetected read
errors" are extraordinarily rare (they are very different from
"unrecoverable read errors", for which unchecksummed raid1 is good
enough). The 32 bits of checksum are peanuts compared to the levels of checksum that already exist on the disk. The crc /could/ be useful for detecting possible duplicate extents, but I don't think it is easily
accessed for that at the moment.

<snip>

Loop devices work well in testing, especially for seeing how to add
disks, replace disks, resize things, etc. Of course they are of little
help in speed testing.

I usually testing using loop devices made in a tmpfs mount - but then,
my machines normally have lots of ram.

I'd never have guessed. :-)

Anyway food for thought here from Piergorgio and yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Chant@21:1/5 to David Brown on Thu Dec 15 23:18:57 2016

On 12/15/2016 11:55 AM, David Brown wrote:

On 14/12/16 21:03, Peter Chant wrote:

On 12/13/2016 09:17 AM, David Brown wrote:

OK, so having done some reading up but not carried out any testing I >>>>> think the following is a possible setup. Note I am using LUKS so I will >>>>> add this extra layer to the mix. I have not noticed any significant >>>>> performance difference with and without it.

This makes me even more suspicious that you (the OP) really have an I/O
problem, or have identified where it is.

I've had periods where the disks have been solidly at 80-90% utalisation
for many seconds yet the CPU has been lightly loaded.

What is the computer doing at the time?

Can't remember specifically I'm afraid.

Is this actually slowing down something that you are waiting for?

It has annoyed me as I was doing other stuff.

It is /normal/ for a computer to run at maximum in some aspects, for
some time. If you are transferring a large file, you /want/ the disks
to be as close to 100% as possible. If you are doing a raid1 scrub, you /want/ the disks to be close to 100%, perhaps for hours at a time if the disks are big (but at low I/O priority so that other tasks can also run).

So far, you have just told me that your system is working.

Well, yes. But when I've not been deduping etc so it is not expected it
can get annoying.

Well. If raiding SSD's there is some logic to this horrible looking
asymetric setup. I'm using an SSD for the OS and that has plenty of
free space. I've also the older smaller SSD it replaced. Although it
looks messy I could free up some space on the current system SSD to use
a partition for RAID and use that in combination with the old, currently
unused SSD. That saves shelling out for another SSD if I want to RAID a
pair. Anaesthetically it is horrid and obviously would impact the speed
of OS access. But is it hardware I have so the monetary cost is no
issue. The time and hair loss cost might not be so trivial.

OK - my first thought was that it was a mistake (possibly just a missing character from a cut-and-paste). Now I see it is intentional.

It was a crude example to see if I had understood the basic concept, so
I was not certain I was 100%, just mainly right.

I am not convinced this will be a good thing - in fact, I am confident
that it is a /bad/ thing. An old small SSD can easily be a /lot/ slower
than a new one - typically, old and small SSDs have poor garbage
collection and little over-provisioning. This means they get very slow
at writes when they are full - even slower, sometimes, than hard disks.
When you create an md raid1 array like this, the first thing md will do
is copy block-for-block from /dev/sda4 to /dev/sdb. It will write to
the entire disk - the old SSD will think that /all/ its normal blocks
are full of important data. As soon as you try to write something else
to it, it must now try to do garbage collection in "panic" mode with
minimal free space - you might find your write latencies measured in /seconds/. And if the SSD is old enough, then it won't be able to
handle reads during the erases and garbage collection. If you are
lucky, reads will come from the other disk. If you are unlucky, reads
will be stalled too.

So I would expect your system to be a good deal slower by doing this, compared to simply using the new SSD on its own.

Well, the old SSD is not that old its a Samsung 840 Pro. Anyway. I have
it and am not otherwise using it and the same goes for a 2TB HDD. So I
can play with caching (but not necessarily RAID). So I can add them to
the machine and have a go. However, this is much larger than the
mariadb files and some others so I can try using it as a dedicated drive
for some things and see if shifting some storage to it makes a
difference. With more effort I can probably play with RAID and caching.

I've also looked up mysql optimisation and xfs is suggested. So I can
make a dedicated xfs partition and see what happens. With or without LVM...

There seem to be two schools of thought on RAID1 here. Anyway, I can
play as far as I like provided I don't waste time or money.

/If/ your old SSD is reasonably fast, you can first run a secure erase
to tell it to drop all data. Then partition it, leaving a little extra
space unused - this overprovisioning can help a lot. And use it with
btrfs raid1, not md raid1, so that only the actual useful data is
replicated.

Think I'll boot on a USB stick and unplug all other drives when trying that.

Would image the SSD in case of mess ups before resizing partitions.

As for the whole of /dev/sdb - I've been using btrfs for a while and it
is normal to give it whole disks, a simple slip of the finger.

I always use partitions, but I usually want a couple of partitions for
other things (like swap).

I don't think anything has been said about the sizes and partitioning of >>> the SSD. For smaller or cheaper SSDs, it is worth leaving a small
amount of unpartitioned space at the end of the disk to give it more
flexibility in garbage collection. (Do a secure erase before
partitioning if it is not a new clean SSD.)

Have done that already.

Including the unused space at the end?

Yes. (but not on the old drive so back to the secure erase step).

Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
not the 0.90, which have less features.

Maybe, not really sure, but partitioning /dev/sdb might
be better.

Use crypt setup to create an encrypted block device on top of this raid >>>>> 10 far 2 device.

Use LVM to create a volume group on top of the encrypted raid 10 far 2 >>>>> device.

Or the other way around.
I'm not sure which is better, maybe your proposal.

Neither is better, IMHO, unless you have some reason to be seriously
paranoid. It is understandable why one would want to encrypt a portable >>> machine that you use a lot while travelling, but a home desktop? Think
about whether encryption here is really a useful thing - adding layers
of complexity does not make anything faster, and it makes it a whole lot >>> more difficult if something goes wrong or if you want to recover your
files from a different system.

Well, in this day and age it seemed like a reasonable idea.

It can be fun to play around with this sort of thing, but that doesn't
mean it is a good idea if you are aiming for a useful system.

One particular thing that can be a serious pain with such complex setups
is if something goes wrong. If something breaks badly, you might have
to put the disks in a different machine to recover the data, or boot the
same machine from a live USB. If you have a lvm cache or bcache in
writeback mode and you have both the SSD and the HD online, and you may
need a system with the right kernel and utilities in order to get the
disks working. Even then, it's easy to accidentally corrupt things
along the way such as by writing to the HD while there are uncommitted changes on the SSD part. I have enough experience with complicated recoveries to know that when something goes wrong, you'll be glad you
kept things simple - and that you documented everything :-)

The system is on a simple partition with btrfs in single. But boot is
ext3 or 4. And there is a partition I consider the 'maintenance'
partition with a full slackware install on ext3 or 4 that I hardly touch
but is handy in case I break something. Also you can't build a kernel
from a rescue disk but you can with this which is nice.

You can reasonably argue that it is a waste of space especially on an
SSD which is expensive compared to a HDD but it is a nice comfort
blanket and doubly so on the rare occasions it is needed.

Create a logical volume on top of the above to use as the cache device. >>>>

Yep, if you use lvmcache, maybe bcache can do as well.

Again, that's unnecessary complexity for a system like this. The
benefits of lvmcache and bcache are debatable even for loads that match
them.

So it is about as fast as it will get now?

I've a spare hdd and ssd so I'll have a play when I get time. Need to
think about a useful benchmark.

It's all about the type of load you are using. lvmcache and/or bcache
can help a lot in some cases - but be no faster than a single HD in
other cases. And for many of the things that /do/ run faster with the caches, you could just as easily run them entirely from the SSD
significantly faster - or get even better results by adding more RAM.

Well. I have the options. What I'm not going to do is by a blister
pack of expensive terrabyte range SSD for everything.

The key to getting /really/ optimal systems is, as you say, useful benchmarks. There is no benefit in looking at the timings of a test
that writes lots of small blocks from lots of threads unless that really
is what you are doing in practice. There is no benefit in running a benchmark after clearing your caches, because you don't clear your
caches in practice - but if you run without clearing the caches, you are
not testing your disk performance. The only "true" benchmark is to run
your system with your typical real tasks and see how it performs.

You only want the raid1 at one level. Your choice is raid1 on btrfs for >>> best performance and efficiency (since only the actual useful data is
mirrored, rather than the entire raw disk), or raid10,far on the md
layer (for greater large file streaming read speed). This can be a big
issue with SSDs - with btrfs raid1 you avoid initially copying over an
entire diskful of data from one device to the other.

replacing a disk and the associated balance took a week.

If you have a lot of data, it takes a while to copy it all over. And a re-balance actually does a good deal more work than just a plain copy.
But at least it doesn't copy the unused space too.

Ues, there is the checksumming for a start. But that is a good thing.

Loop devices work well in testing, especially for seeing how to add
disks, replace disks, resize things, etc. Of course they are of little
help in speed testing.

I usually testing using loop devices made in a tmpfs mount - but then,
my machines normally have lots of ram.

I'd never have guessed. :-)

Anyway food for thought here from Piergorgio and yourself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Briels
  Tue Apr 23 20:54:03 2024
  from Uk via SSH
- Michal Wronka
  Wed Apr 24 14:13:57 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Wed Apr 24 14:02:51 2024
  from Wroclaw, Poland via SSH
- Guest
  Wed Apr 24 01:40:10 2024
  from A via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	32:48:05
Calls:	6,648
Calls today:	3
Files:	12,193
Messages:	5,328,599

Speed ups for a disk IO bound machine

Who's Online

Recent Visitors

System Info