• Speed ups for a disk IO bound machine

    From Piergiorgio Sartor@21:1/5 to Peter Chant on Sun Dec 11 14:10:30 2016
    On 2016-12-11 12:54, Peter Chant wrote:
    [...]
    Thanks. I've done some reading and there is more to do plus some experimentation.

    Experimentation is good.
    It is the only way to have so idea of the
    different peculiarities.

    I understand that mdadm is used to create the raid arrays, it is not
    part of lvm itself?

    LVM (dmraid) and md share a lot of code, but I'm
    not sure about this RAID-10. Maybe it is only md.

    [...]
    OK. Simple to set up with the SSDs as one is blank and the other has
    free space.

    Be careful, it is easy to destroy data.
    Maybe you can practice with loop devices.
    There are some howtos around.

    With RAID-10 far 2 am I correct in assuming that the available capacity
    for a two device array of 3TB disks would be 3TB (two copies of data)?

    Yep.

    Part of the RAID-10 with SSD can be used as cache.


    OK, so having done some reading up but not carried out any testing I
    think the following is a possible setup. Note I am using LUKS so I will
    add this extra layer to the mix. I have not noticed any significant performance difference with and without it.

    OK, you'll have to decide at which layer LUKS fits.
    There are pro and cons for each case.

    SSDs:
    Create empty partition on larger system SSD.
    Add smaller empty SSD.

    Create raid 10 far 2 raid array with mdadm for example (need to check
    what metadata means!):
    mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2 --layout=f2 /dev/sda4 /dev/sdb

    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    Use crypt setup to create an encrypted block device on top of this raid
    10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2 device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.

    Create a logical volume on top of the above to use as the cache device.

    Yep, if you use lvmcache, maybe bcache can do as well.

    HDD:
    Similar to the SSD case above except that the logical volume will be for
    the slow disks doing the bulk of the work.

    OK, I think.

    Create a cached device:
    Use lvmcache to create a device using the the two logical volumes
    created above (bcache would also work).

    Seems good to me.

    Create a file system on top of the cached device. If using btrfs (what
    I do now) use single for data as the raid is occuring a couple of levels down.

    Well, it might be btrfs has already RAID-10.
    Again, code is shared between this and md too.

    Of course, I could have got completely the wrong idea above and invented
    some horrid monster!

    If I understood it right, it sounds OK to me.

    I would, in any case, strongly suggest to experiment,
    maybe, as mentioned above, with loop devices.
    Not for performances, but for practising possible
    combinations and layouts.

    Then there is the story of the caching, which has
    different scope and performances between bcache
    and lvmcache.
    In your specific case, I cannot judge which is better,
    but lvmcache seems to me easier.

    It looks like I could do this without the LVM layer between the LUKs and cache parts. However, this does give me the flexability to create
    logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
    cached. Or different cache options. I think I could have put another
    LVM on top of the cached item I was creating but I think that was over
    doing it.

    Probably it is.
    I would try to use not more than one component at time.
    So, 1 md, 1 LUKS, 1 LVM, at maximum.
    If possible less.

    It would be also possible to create two PVs out of the two
    RAID-10 and a single VG with both.
    Then the LV can be fitted in one or the other PV.
    LUKS can be at RAID level or on top of LV.

    This type of "generic" setup has some advantages in case
    of hardware updates (easy to add / remove storage devices,
    by using pvmove).

    bye,

    --

    piergiorgio

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to Roger Blake on Sun Dec 11 22:07:59 2016
    On 12/11/2016 01:34 AM, Roger Blake wrote:
    On 2016-12-10, Peter Chant <pete@petezilla.co.uk> wrote:
    Is there a reason why you gave that one as an example, chipset,
    manufacturer or was it simply an inexpensive board with useful specs?

    Just an example of a low-cost SATA III board. There are plenty around.
    I used one similar to this for my SSD before upgrading to a board that
    had built-in SATA III.


    Thanks. Looking at the moment it seems there is not much middle ground,
    either two ports much cheaper or something much more heavyweight with
    matching price tag. But I have two SSDs only and the two port boards
    are cheap enough.


    It sounds like your bottleneck is the mechanical drives. To significantly speed up you'll need faster drives, or at least a faster type of RAID array.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to Piergiorgio Sartor on Sun Dec 11 22:46:43 2016
    On 12/11/2016 01:10 PM, Piergiorgio Sartor wrote:

    Be careful, it is easy to destroy data.
    Maybe you can practice with loop devices.
    There are some howtos around.


    Thanks. I'm reasonably careful. However, I'd probally image the device
    before trying gparted on it.

    With RAID-10 far 2 am I correct in assuming that the available capacity
    for a two device array of 3TB disks would be 3TB (two copies of data)?

    Yep.

    So I have the same space as I have now with btrfs, but with striping and
    no loss of redundancy. Cool.




    SSDs:
    Create empty partition on larger system SSD.
    Add smaller empty SSD.

    Create raid 10 far 2 raid array with mdadm for example (need to check
    what metadata means!):
    mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
    --layout=f2 /dev/sda4 /dev/sdb

    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    I've not looked into that. Perhaps the default is sensible if I don't
    pick one. Must do reading.


    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    I've been using btrfs for a couple of years now, where it is quite
    normal not to partition disks. So maybe partition.


    Use crypt setup to create an encrypted block device on top of this raid
    10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2
    device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.



    I though on top of the raid as otherwise there are four lots of LUKS
    with passwords to manage (careful use of keyfiles would undoubtedly
    help). Logically ought to be less overhead doing it twice rather than
    four times, but that is a guess.

    Create a logical volume on top of the above to use as the cache device.

    Yep, if you use lvmcache, maybe bcache can do as well.

    HDD:
    Similar to the SSD case above except that the logical volume will be for
    the slow disks doing the bulk of the work.

    OK, I think.

    Create a cached device:
    Use lvmcache to create a device using the the two logical volumes
    created above (bcache would also work).

    Seems good to me.

    Create a file system on top of the cached device. If using btrfs (what
    I do now) use single for data as the raid is occuring a couple of levels
    down.

    Well, it might be btrfs has already RAID-10.
    Again, code is shared between this and md too.


    Btrfs does indeed support RAID10. However, it is fairly new and does
    not appear to support far 2. So mdadm is the way to achieve that.

    Of course, I could have got completely the wrong idea above and invented
    some horrid monster!

    If I understood it right, it sounds OK to me.


    Thanks for your help.


    I would, in any case, strongly suggest to experiment,
    maybe, as mentioned above, with loop devices.
    Not for performances, but for practising possible
    combinations and layouts.


    Any for ensuring I can get a sensible workable system. :-)

    Then there is the story of the caching, which has
    different scope and performances between bcache
    and lvmcache.
    In your specific case, I cannot judge which is better,
    but lvmcache seems to me easier.


    Really I'm not sure. But it seems worthwhile giving a cache system a go.

    It looks like I could do this without the LVM layer between the LUKs and
    cache parts. However, this does give me the flexability to create
    logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
    cached. Or different cache options. I think I could have put another
    LVM on top of the cached item I was creating but I think that was over
    doing it.

    Probably it is.
    I would try to use not more than one component at time.
    So, 1 md, 1 LUKS, 1 LVM, at maximum.
    If possible less.

    I'd not expect it to work if I set it up in one go.

    It would be also possible to create two PVs out of the two
    RAID-10 and a single VG with both.
    Then the LV can be fitted in one or the other PV.
    LUKS can be at RAID level or on top of LV.

    If I understand correctly: One mdadm raid10 far 2 from SSDs, one from
    HDDs and then add them BOTH to one single VG, and then build the cached
    file system from that one volume group? I did not think it worked like
    that, as you'd have to control which disks the cache was on.


    This type of "generic" setup has some advantages in case
    of hardware updates (easy to add / remove storage devices,
    by using pvmove).

    bye,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Peter Chant on Mon Dec 12 09:11:10 2016
    On 07/12/16 23:41, Peter Chant wrote:
    I'm wondering if there are any cheap / easy speed ups for an IO bound machine?

    ASUS M4A78 Pro motherboard
    Phenom II x6 1090T processor (fastest or second fastest that will
    go in this mobo)
    8GB RAM

    480GB SSD for system files, btrfs, single.
    2x 3TB WD red HDD as btrfs RAID1 for data.
    Also barely used DVD burner.

    I use the on-board graphics.

    Slackware 14.1 (in process of 14.2 upgrade) with 4.8.5 kernel.

    Using atop etc it seems that the machine is IO bound. Also notice that
    if apache / php / mariadb, which are running on the same machine, are churning away and unresponsive I can browse the internet still with
    little impact which makes me think it is a local IO thing.


    My first thought is to ask what the machine is being used for. It is
    not normal to have apache/php/mariadb and firefox/browsing on the same
    machine. Usually you have either a server (and therefore no desktop processes), or a desktop (and therefore no server processes of
    particular relevance). So what are you doing with the system? What is
    it that is using the I/O ? Is the machine really too slow, and what do
    you /feel/ is slow on it?

    My second thought is that often the best way to improve I/O performance
    is to add more ram. Is that a possibility with your hardware?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Piergiorgio Sartor@21:1/5 to Peter Chant on Mon Dec 12 19:22:00 2016
    On 2016-12-11 23:46, Peter Chant wrote:
    [...]
    It would be also possible to create two PVs out of the two
    RAID-10 and a single VG with both.
    Then the LV can be fitted in one or the other PV.
    LUKS can be at RAID level or on top of LV.

    If I understand correctly: One mdadm raid10 far 2 from SSDs, one from
    HDDs and then add them BOTH to one single VG, and then build the cached
    file system from that one volume group? I did not think it worked like
    that, as you'd have to control which disks the cache was on.

    It seems possible to specify where the LV will
    be placed in a multi PV lvm setup.
    In case of lvmcache (maybe bcache too, in a certain
    way) it should be possible to tell lvm to place the
    cache on PV with SSD.

    You'll have to check the docs carefully, there are
    many "secrets"... :-)

    bye,

    --

    piergiorgio

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to David Brown on Mon Dec 12 23:08:07 2016
    On 12/12/2016 08:11 AM, David Brown wrote:

    My first thought is to ask what the machine is being used for. It is
    not normal to have apache/php/mariadb and firefox/browsing on the same machine. Usually you have either a server (and therefore no desktop processes), or a desktop (and therefore no server processes of
    particular relevance). So what are you doing with the system? What is
    it that is using the I/O ? Is the machine really too slow, and what do
    you /feel/ is slow on it?


    Home desktop / server. I have mariadb on it and the easierst way for a
    front end for some stuff I am doing seemed to be a lamp stack.

    Also have mediatomb on it and serve files to mpd on a raspberry pi.
    Mediatomb seems to churn the disk sometimes. Not installed right now.


    My second thought is that often the best way to improve I/O performance
    is to add more ram. Is that a possibility with your hardware?


    I've got 8GB and about 40% acts as a disk cache. I could try hunting
    down 16GB on ebay but I paid full price a year or so ago for the 8GB.
    Looking at kinfocentre right now I have 19% of my memory free, 35% in
    disk cache and 44% application data. So would adding more ram do
    anything but add to free memory?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Peter Chant on Tue Dec 13 09:29:13 2016
    On 13/12/16 00:08, Peter Chant wrote:
    On 12/12/2016 08:11 AM, David Brown wrote:

    My first thought is to ask what the machine is being used for. It is
    not normal to have apache/php/mariadb and firefox/browsing on the same
    machine. Usually you have either a server (and therefore no desktop
    processes), or a desktop (and therefore no server processes of
    particular relevance). So what are you doing with the system? What is
    it that is using the I/O ? Is the machine really too slow, and what do
    you /feel/ is slow on it?


    Home desktop / server. I have mariadb on it and the easierst way for a
    front end for some stuff I am doing seemed to be a lamp stack.

    Do you mean you are running server an active webserver on the same
    system you are using for browsing, development, games, email, etc.?
    That is a poor setup, for efficiency, reliability, and security. Of
    course, economics can be a factor - but if you can afford to be playing
    around with SSDs and multiple disks, then you should also consider if
    you should have a separate machine for the server. A small Intel NUC
    with a single disk is likely to be good enough for your LAMP stack and mediatomb server, leaving your desktop free to be a desktop.


    Also have mediatomb on it and serve files to mpd on a raspberry pi.
    Mediatomb seems to churn the disk sometimes. Not installed right now.


    My second thought is that often the best way to improve I/O performance
    is to add more ram. Is that a possibility with your hardware?


    I've got 8GB and about 40% acts as a disk cache. I could try hunting
    down 16GB on ebay but I paid full price a year or so ago for the 8GB.
    Looking at kinfocentre right now I have 19% of my memory free, 35% in
    disk cache and 44% application data. So would adding more ram do
    anything but add to free memory?


    Yes, adding memory will make a /serious/ difference when you are trying
    to work as a server and a desktop - /if/ you really are having
    performance issues with I/O. But to be honest, I don't think you /are/
    having I/O performance issues - I suspect you just think you are. If
    you have a lot of free memory (and 19% is quite a lot, unless you have
    just stopped a large process) then you are not actually doing a lot of
    I/O, because disk data is cached in ram whenever there is /any/ free ram.

    With more ram, writes go faster because they stay in memory for longer
    and don't get flushed to disk as often. Reads go faster because it is
    much more likely that the data is already in memory. You also have the
    option of putting /tmp on tmpfs to speed up processes that use a lot of temporary files.

    But again, I would strongly suggest you try to identify what /feels/
    slow, and consider how you use the machine. What are you doing in
    parallel? What sort of serving are you actually doing, and is it
    running in parallel with desktop usage? Why do you think your I/O is slow?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Piergiorgio Sartor on Tue Dec 13 09:57:59 2016
    On 10/12/16 23:05, Piergiorgio Sartor wrote:
    On 2016-12-10 21:44, Peter Chant wrote:
    [...]
    Well, I've not taken any formal stats but a good part of the load is
    mariadb at the same time as apache / php and firefox. I'd assume random.

    This can fit SSD or HDD + cache on SSD.

    [...]
    I'm a bit confused here. My understanding is that RAID10 requires four
    disks, that they are striped in pairs and the pairs then mirror each
    other. So I can't do that if I have two drives.

    Look at the Linux MD RAID-10 documentation.
    You'll see any number, even odd, will do.

    Here some reference:

    https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

    In case of RAID-10, with 2 disks, layout "far 2",
    the disks combines RAID-0 and RAID-1.

    So, the RAID will survive a failure of 1 disk,
    but the read performances are of RAID-0.

    In your case, I'll combine the two HDDs in one
    RAID-10 and two SSDs in an other RAID-10.
    Both with layout "far 2".

    There is no point in using "far" layout on SSDs. The benefit of the
    "far" layout is for harddisks - it means that most reads will come from
    the faster outer cylinders of the disk, and you have less head movement
    during reads (but more head movement, and therefore greater latency,
    during writes). For SSDs, the layout makes no difference to the timing
    and "near" is fine for RAID10.

    Note also that if you are using the md layer for the raid, then don't
    use raid on the btrfs layer as well.

    In general, I would recommend md raid10,far for systems that need high performance streaming of large files. For small file access, it has few benefits over md raid1. And if you have btrfs, then using raid1 in
    btrfs rather than md can be significantly more efficient.


    Part of the RAID-10 with SSD can be used as cache.

    [...]
    Hmm. bcache is simpler as a starting point.

    bcache is not simpler for anything. It is an extra complicating layer,
    and on a system like this it is completely unnecessary.


    Well, not really.
    If you already use LVM, than lvmcache is easier.

    Same goes for lvmcache in this case - it is overkill.

    Because you can add and remove the cache to the
    LVM volume on the fly.
    With bcache, you'll have to start with it from
    the beginning.

    bye,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Piergiorgio Sartor on Tue Dec 13 10:17:46 2016
    On 11/12/16 14:10, Piergiorgio Sartor wrote:
    On 2016-12-11 12:54, Peter Chant wrote:
    [...]
    Thanks. I've done some reading and there is more to do plus some
    experimentation.

    Experimentation is good.
    It is the only way to have so idea of the
    different peculiarities.

    I understand that mdadm is used to create the raid arrays, it is not
    part of lvm itself?

    LVM (dmraid) and md share a lot of code, but I'm
    not sure about this RAID-10. Maybe it is only md.


    lvm can do raid1 and raid0, and can be layered to give a traditional
    4-drive raid10. But it cannot do md-style raid10, nor the other more
    advanced raid setups. It is also less efficient than md raid - but
    sometimes more convenient because it is much easier to have the raid
    settings change on the fly (such as if you try to make a raid0 stripe
    from physical volumes of different sizes - lvm will stripe as much as
    possible, and give you unstriped use of the rest of the space).

    [...]
    OK. Simple to set up with the SSDs as one is blank and the other has
    free space.

    Be careful, it is easy to destroy data.
    Maybe you can practice with loop devices.
    There are some howtos around.

    With RAID-10 far 2 am I correct in assuming that the available capacity
    for a two device array of 3TB disks would be 3TB (two copies of data)?

    Yep.

    Part of the RAID-10 with SSD can be used as cache.


    OK, so having done some reading up but not carried out any testing I
    think the following is a possible setup. Note I am using LUKS so I will
    add this extra layer to the mix. I have not noticed any significant
    performance difference with and without it.

    This makes me even more suspicious that you (the OP) really have an I/O problem, or have identified where it is.


    OK, you'll have to decide at which layer LUKS fits.
    There are pro and cons for each case.

    SSDs:
    Create empty partition on larger system SSD.
    Add smaller empty SSD.

    Create raid 10 far 2 raid array with mdadm for example (need to check
    what metadata means!):
    mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
    --layout=f2 /dev/sda4 /dev/sdb

    That looks like you are raiding a partition on one device with the
    entire second device. Are you sure that is what you want?

    I don't think anything has been said about the sizes and partitioning of
    the SSD. For smaller or cheaper SSDs, it is worth leaving a small
    amount of unpartitioned space at the end of the disk to give it more flexibility in garbage collection. (Do a secure erase before
    partitioning if it is not a new clean SSD.)


    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    Use crypt setup to create an encrypted block device on top of this raid
    10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2
    device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.

    Neither is better, IMHO, unless you have some reason to be seriously
    paranoid. It is understandable why one would want to encrypt a portable machine that you use a lot while travelling, but a home desktop? Think
    about whether encryption here is really a useful thing - adding layers
    of complexity does not make anything faster, and it makes it a whole lot
    more difficult if something goes wrong or if you want to recover your
    files from a different system.


    Create a logical volume on top of the above to use as the cache device.

    Yep, if you use lvmcache, maybe bcache can do as well.

    Again, that's unnecessary complexity for a system like this. The
    benefits of lvmcache and bcache are debatable even for loads that match
    them.


    HDD:
    Similar to the SSD case above except that the logical volume will be for
    the slow disks doing the bulk of the work.

    OK, I think.

    Create a cached device:
    Use lvmcache to create a device using the the two logical volumes
    created above (bcache would also work).

    Seems good to me.

    Create a file system on top of the cached device. If using btrfs (what
    I do now) use single for data as the raid is occuring a couple of levels
    down.

    Well, it might be btrfs has already RAID-10.
    Again, code is shared between this and md too.

    btrfs does not have raid10, and it does not share significant raid1 or
    raid0 code with md. (It /does/ share code for calculating raid5 and
    raid6 parities, but that's another issue - and don't use btrfs raid5/6
    until the bugs are sorted out!).

    You only want the raid1 at one level. Your choice is raid1 on btrfs for
    best performance and efficiency (since only the actual useful data is
    mirrored, rather than the entire raw disk), or raid10,far on the md
    layer (for greater large file streaming read speed). This can be a big
    issue with SSDs - with btrfs raid1 you avoid initially copying over an
    entire diskful of data from one device to the other.


    Of course, I could have got completely the wrong idea above and invented
    some horrid monster!

    If I understood it right, it sounds OK to me.

    I would, in any case, strongly suggest to experiment,
    maybe, as mentioned above, with loop devices.

    Agreed.

    Not for performances, but for practising possible
    combinations and layouts.

    Agreed.


    Then there is the story of the caching, which has
    different scope and performances between bcache
    and lvmcache.
    In your specific case, I cannot judge which is better,
    but lvmcache seems to me easier.

    It looks like I could do this without the LVM layer between the LUKs and
    cache parts. However, this does give me the flexability to create
    logical volumes that are on HDD RAID10, SDD RAID10 soley as well as
    cached. Or different cache options. I think I could have put another
    LVM on top of the cached item I was creating but I think that was over
    doing it.

    Probably it is.
    I would try to use not more than one component at time.
    So, 1 md, 1 LUKS, 1 LVM, at maximum.
    If possible less.

    It would be also possible to create two PVs out of the two
    RAID-10 and a single VG with both.
    Then the LV can be fitted in one or the other PV.
    LUKS can be at RAID level or on top of LV.

    This type of "generic" setup has some advantages in case
    of hardware updates (easy to add / remove storage devices,
    by using pvmove).

    bye,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to David Brown on Wed Dec 14 20:03:04 2016
    On 12/13/2016 09:17 AM, David Brown wrote:

    OK, so having done some reading up but not carried out any testing I
    think the following is a possible setup. Note I am using LUKS so I will >>> add this extra layer to the mix. I have not noticed any significant
    performance difference with and without it.

    This makes me even more suspicious that you (the OP) really have an I/O problem, or have identified where it is.



    I've had periods where the disks have been solidly at 80-90% utalisation
    for many seconds yet the CPU has been lightly loaded.


    Create raid 10 far 2 raid array with mdadm for example (need to check
    what metadata means!):
    mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2
    --layout=f2 /dev/sda4 /dev/sdb

    That looks like you are raiding a partition on one device with the
    entire second device. Are you sure that is what you want?


    Well. If raiding SSD's there is some logic to this horrible looking
    asymetric setup. I'm using an SSD for the OS and that has plenty of
    free space. I've also the older smaller SSD it replaced. Although it
    looks messy I could free up some space on the current system SSD to use
    a partition for RAID and use that in combination with the old, currently
    unused SSD. That saves shelling out for another SSD if I want to RAID a
    pair. Anaesthetically it is horrid and obviously would impact the speed
    of OS access. But is it hardware I have so the monetary cost is no
    issue. The time and hair loss cost might not be so trivial.

    Would image the SSD in case of mess ups before resizing partitions.

    As for the whole of /dev/sdb - I've been using btrfs for a while and it
    is normal to give it whole disks, a simple slip of the finger.

    I don't think anything has been said about the sizes and partitioning of
    the SSD. For smaller or cheaper SSDs, it is worth leaving a small
    amount of unpartitioned space at the end of the disk to give it more flexibility in garbage collection. (Do a secure erase before
    partitioning if it is not a new clean SSD.)


    Have done that already.


    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    Use crypt setup to create an encrypted block device on top of this raid
    10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2
    device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.

    Neither is better, IMHO, unless you have some reason to be seriously paranoid. It is understandable why one would want to encrypt a portable machine that you use a lot while travelling, but a home desktop? Think
    about whether encryption here is really a useful thing - adding layers
    of complexity does not make anything faster, and it makes it a whole lot
    more difficult if something goes wrong or if you want to recover your
    files from a different system.


    Well, in this day and age it seemed like a reasonable idea.


    Create a logical volume on top of the above to use as the cache device.

    Yep, if you use lvmcache, maybe bcache can do as well.

    Again, that's unnecessary complexity for a system like this. The
    benefits of lvmcache and bcache are debatable even for loads that match
    them.


    So it is about as fast as it will get now?

    I've a spare hdd and ssd so I'll have a play when I get time. Need to
    think about a useful benchmark.




    btrfs does not have raid10, and it does not share significant raid1 or
    raid0 code with md. (It /does/ share code for calculating raid5 and
    raid6 parities, but that's another issue - and don't use btrfs raid5/6
    until the bugs are sorted out!).

    Oh yes it does. :-). But I don't see anything about far 2.


    You only want the raid1 at one level. Your choice is raid1 on btrfs for
    best performance and efficiency (since only the actual useful data is mirrored, rather than the entire raw disk), or raid10,far on the md
    layer (for greater large file streaming read speed). This can be a big
    issue with SSDs - with btrfs raid1 you avoid initially copying over an
    entire diskful of data from one device to the other.

    replacing a disk and the associated balance took a week.


    I'm at RAID1 with btrfs now. Yes, not RAIDing btrfs over another RAID
    as that make little sense here.


    Of course, I could have got completely the wrong idea above and invented >>> some horrid monster!

    If I understood it right, it sounds OK to me.

    I would, in any case, strongly suggest to experiment,
    maybe, as mentioned above, with loop devices.

    Agreed.

    Spare hdd and ssd. Could use loop devices but perhaps real hw is useful,
    though not enough to RAID.


    Not for performances, but for practising possible

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to David Brown on Wed Dec 14 23:16:40 2016
    On 12/13/2016 08:29 AM, David Brown wrote:
    On 13/12/16 00:08, Peter Chant wrote:
    On 12/12/2016 08:11 AM, David Brown wrote:

    My first thought is to ask what the machine is being used for. It is
    not normal to have apache/php/mariadb and firefox/browsing on the same
    machine. Usually you have either a server (and therefore no desktop
    processes), or a desktop (and therefore no server processes of
    particular relevance). So what are you doing with the system? What is
    it that is using the I/O ? Is the machine really too slow, and what do
    you /feel/ is slow on it?


    Home desktop / server. I have mariadb on it and the easierst way for a
    front end for some stuff I am doing seemed to be a lamp stack.

    Do you mean you are running server an active webserver on the same
    system you are using for browsing, development, games, email, etc.?
    That is a poor setup, for efficiency, reliability, and security. Of
    course, economics can be a factor - but if you can afford to be playing around with SSDs and multiple disks, then you should also consider if
    you should have a separate machine for the server. A small Intel NUC
    with a single disk is likely to be good enough for your LAMP stack and mediatomb server, leaving your desktop free to be a desktop.


    I think my lamp stack is somewhat atypical. I'm storing numerical data
    in it and doing some calcs on that. Might be good for storage but calcs
    in python and php are not optimal. However, this is partly historic and
    partly convenience and rework would be a major pita.

    However, if I get slowdowns on this fairly elderly machine and a six
    core cpu and 8GB of ram then with albeit a newer generation CPU I don't
    see a NUC being much faster, though I admit I've never got my ands on
    one. Plus there is not room for the two hdds plus the ssd OS disk.
    Using this machine as the server and the nuc as the desktop would make
    more sense.

    I did think about this in the past, or getting a nice laptop / docking
    station combination and a server setup.

    The system is fairly responsive right now but I am not hitting the HDDs
    right now using thunderbird as I type this. This is with duperemove
    hitting the HDDs hard and I'd not consider doing anything else that hit
    the HDDs. Incidentally application data is now 72% of physical memory
    and disk cache 24-25% with the remainder few % free.



    Also have mediatomb on it and serve files to mpd on a raspberry pi.
    Mediatomb seems to churn the disk sometimes. Not installed right now.


    My second thought is that often the best way to improve I/O performance
    is to add more ram. Is that a possibility with your hardware?


    I've got 8GB and about 40% acts as a disk cache. I could try hunting
    down 16GB on ebay but I paid full price a year or so ago for the 8GB.
    Looking at kinfocentre right now I have 19% of my memory free, 35% in
    disk cache and 44% application data. So would adding more ram do
    anything but add to free memory?


    Yes, adding memory will make a /serious/ difference when you are trying
    to work as a server and a desktop - /if/ you really are having
    performance issues with I/O. But to be honest, I don't think you /are/ having I/O performance issues - I suspect you just think you are. If
    you have a lot of free memory (and 19% is quite a lot, unless you have
    just stopped a large process) then you are not actually doing a lot of
    I/O, because disk data is cached in ram whenever there is /any/ free ram.


    Given that there is little free now the numbers I quoted earlier might
    not have been representative. I did not see a noticeable difference
    between 4 & 8 GB therefore I'd not considered more ram. However, if it
    is really likely to make a big difference and with 16GB of DDR2 going
    for between £15 and £45 on ebay then some research is warranted.

    With more ram, writes go faster because they stay in memory for longer
    and don't get flushed to disk as often. Reads go faster because it is
    much more likely that the data is already in memory. You also have the option of putting /tmp on tmpfs to speed up processes that use a lot of temporary files.

    I've put /tmp on tmpfs before. I have /dev/shm on tmpfs at the moment
    as part of slackware's default config. Generally I've abandoned this
    when compiling packages filled up /tmp and I ran out of tmp space.
    Generally a failure to clean up /tmp, but some packages are large and
    have a lot of dependencies.


    But again, I would strongly suggest you try to identify what /feels/
    slow, and consider how you use the machine. What are you doing in
    parallel? What sort of serving are you actually doing, and is it
    running in parallel with desktop usage? Why do you think your I/O is slow?


    I used the term 'IO bound' as I've seen the HDDs hit 80-90% for long
    periods yet CPU usage has been relatively low. So to me IO was the
    limiting factor. Going out and spending lots of cash (not cache!) on
    the latest i7 + motherboard and memory therefore I assume would not
    improve the user experience whereas speeding up the existing IO would,
    if possible.

    The lamp load above is likely excessive but I have seen slowdowns before
    with this machine. Sometimes btrfs seems to build up a backlog of stuff
    to do (btrfs cleaner, transactions etc) for a while after doing
    something disk intensive. But I've noticed this less lately. Btrfs has
    not got a reputation for being slow although odd and specific cases do
    show up on the mailing list from time to time. I'm not planning on
    swapping file systems unless to another with subvolumes and probably
    snapshots as subvolumes have let me organise things in a much more
    logical and efficient manner since I have started using them.

    I have a nagging feeling that something just is not right. However, I
    need to benchmark. I also have had a cheap two interface SATAIII card.
    If there is something odd with the disk interface (can't see what) maybe
    that will shake it out. It should allow the SSD to function to its
    potential anyway, so it is not a bad idea.

    Unfortunately the slightly higher range 4 port SATA III PCIe x2 cards
    seem limited right now, I'd have to go up quite a notch in price to
    eight port / SAS cards and I'm starting to through reasonable sums of
    money at an elderly mobo / processor / ram combination with no assured
    outcome. However, cheap improvements and especially improvements with
    existing kit are definitely work pursuing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Peter Chant on Thu Dec 15 13:15:32 2016
    On 15/12/16 00:16, Peter Chant wrote:
    On 12/13/2016 08:29 AM, David Brown wrote:
    On 13/12/16 00:08, Peter Chant wrote:
    On 12/12/2016 08:11 AM, David Brown wrote:

    My first thought is to ask what the machine is being used for. It is
    not normal to have apache/php/mariadb and firefox/browsing on the same >>>> machine. Usually you have either a server (and therefore no desktop
    processes), or a desktop (and therefore no server processes of
    particular relevance). So what are you doing with the system? What is >>>> it that is using the I/O ? Is the machine really too slow, and what do >>>> you /feel/ is slow on it?


    Home desktop / server. I have mariadb on it and the easierst way for a
    front end for some stuff I am doing seemed to be a lamp stack.

    Do you mean you are running server an active webserver on the same
    system you are using for browsing, development, games, email, etc.?
    That is a poor setup, for efficiency, reliability, and security. Of
    course, economics can be a factor - but if you can afford to be playing
    around with SSDs and multiple disks, then you should also consider if
    you should have a separate machine for the server. A small Intel NUC
    with a single disk is likely to be good enough for your LAMP stack and
    mediatomb server, leaving your desktop free to be a desktop.


    I think my lamp stack is somewhat atypical. I'm storing numerical data
    in it and doing some calcs on that. Might be good for storage but calcs
    in python and php are not optimal. However, this is partly historic and partly convenience and rework would be a major pita.

    For Python, you can look at numpy and scipy for serious calculations -
    if you can work with your data as homogeneous arrays then numpy will do
    the calculations in fast C libraries rather than interpreted Python.
    Also look at pypy or psyco as ways to speed up Python code. You may
    also find that if you have heavy Python pages you are better using
    Twisted as a webserver rather than Apache so that you work entirely
    within the one Python process rather than starting and stopping
    processes all the time.


    However, if I get slowdowns on this fairly elderly machine and a six
    core cpu and 8GB of ram then with albeit a newer generation CPU I don't
    see a NUC being much faster, though I admit I've never got my ands on
    one. Plus there is not room for the two hdds plus the ssd OS disk.

    A NUC will give you as good I/O disk throughput (or better - I think you
    only had SATA-2 on your machine?). The processor may or may not be
    better - there are far too many NUC variants to keep track of!

    But the point is to separate the different types of usage.

    Using this machine as the server and the nuc as the desktop would make
    more sense.

    Maybe that's the way to do it.


    I did think about this in the past, or getting a nice laptop / docking station combination and a server setup.

    The system is fairly responsive right now but I am not hitting the HDDs
    right now using thunderbird as I type this. This is with duperemove
    hitting the HDDs hard and I'd not consider doing anything else that hit
    the HDDs. Incidentally application data is now 72% of physical memory
    and disk cache 24-25% with the remainder few % free.



    Also have mediatomb on it and serve files to mpd on a raspberry pi.
    Mediatomb seems to churn the disk sometimes. Not installed right now.


    My second thought is that often the best way to improve I/O performance >>>> is to add more ram. Is that a possibility with your hardware?


    I've got 8GB and about 40% acts as a disk cache. I could try hunting
    down 16GB on ebay but I paid full price a year or so ago for the 8GB.
    Looking at kinfocentre right now I have 19% of my memory free, 35% in
    disk cache and 44% application data. So would adding more ram do
    anything but add to free memory?


    Yes, adding memory will make a /serious/ difference when you are trying
    to work as a server and a desktop - /if/ you really are having
    performance issues with I/O. But to be honest, I don't think you /are/
    having I/O performance issues - I suspect you just think you are. If
    you have a lot of free memory (and 19% is quite a lot, unless you have
    just stopped a large process) then you are not actually doing a lot of
    I/O, because disk data is cached in ram whenever there is /any/ free ram.


    Given that there is little free now the numbers I quoted earlier might
    not have been representative. I did not see a noticeable difference
    between 4 & 8 GB therefore I'd not considered more ram. However, if it
    is really likely to make a big difference and with 16GB of DDR2 going
    for between £15 and £45 on ebay then some research is warranted.

    I have seen extra ram make an impressive difference to speed. Not long
    ago a fellow developer here thought he needed a new graphics card at
    about £500 because his current £300 one was too slow for the 3D
    rendering he was doing. But £30 more ram doubled the speed of the
    system, while a new graphics card would have made little difference.


    With more ram, writes go faster because they stay in memory for longer
    and don't get flushed to disk as often. Reads go faster because it is
    much more likely that the data is already in memory. You also have the
    option of putting /tmp on tmpfs to speed up processes that use a lot of
    temporary files.

    I've put /tmp on tmpfs before. I have /dev/shm on tmpfs at the moment
    as part of slackware's default config. Generally I've abandoned this
    when compiling packages filled up /tmp and I ran out of tmp space.
    Generally a failure to clean up /tmp, but some packages are large and
    have a lot of dependencies.

    /dev/shm is always on tmpfs (in modern systems, anyway). Processes use
    it specifically as a convenient way to have a block of memory shared
    between them - the create a file on the tmpfs while remains in memory,
    then mmap it to access the memory directly.

    It is often more efficient to have /tmp on tmpfs and let it spill out
    into swap, than to have the /tmp directly on the disk.



    But again, I would strongly suggest you try to identify what /feels/
    slow, and consider how you use the machine. What are you doing in
    parallel? What sort of serving are you actually doing, and is it
    running in parallel with desktop usage? Why do you think your I/O is slow? >>

    I used the term 'IO bound' as I've seen the HDDs hit 80-90% for long
    periods yet CPU usage has been relatively low. So to me IO was the
    limiting factor. Going out and spending lots of cash (not cache!) on
    the latest i7 + motherboard and memory therefore I assume would not
    improve the user experience whereas speeding up the existing IO would,
    if possible.

    The lamp load above is likely excessive but I have seen slowdowns before
    with this machine. Sometimes btrfs seems to build up a backlog of stuff
    to do (btrfs cleaner, transactions etc) for a while after doing
    something disk intensive. But I've noticed this less lately. Btrfs has
    not got a reputation for being slow although odd and specific cases do
    show up on the mailing list from time to time. I'm not planning on
    swapping file systems unless to another with subvolumes and probably snapshots as subvolumes have let me organise things in a much more
    logical and efficient manner since I have started using them.

    Agreed. Btrfs is not perfect, but I find it the best choice for my
    usage. Cheap snapshots are really nice!


    I have a nagging feeling that something just is not right. However, I
    need to benchmark. I also have had a cheap two interface SATAIII card.
    If there is something odd with the disk interface (can't see what) maybe
    that will shake it out. It should allow the SSD to function to its
    potential anyway, so it is not a bad idea.

    hdparm can give you a simple test of the buffer bandwidth, and smartctrl
    can be useful to list the features of the interface and device to check
    for obvious missing points.

    A newer cpu and motherboard may not seem useful from the viewpoint of
    processor power, but they will have better throughput on the I/O and
    faster native SATA.


    Unfortunately the slightly higher range 4 port SATA III PCIe x2 cards
    seem limited right now, I'd have to go up quite a notch in price to
    eight port / SAS cards and I'm starting to through reasonable sums of
    money at an elderly mobo / processor / ram combination with no assured outcome. However, cheap improvements and especially improvements with existing kit are definitely work pursuing.


    I'd avoid SAS - but that is because I have been bitten by it. There is
    not much that SAS does better than SATA (queued trim commands is one
    thing), and the disks cost twice as much. I prefer just to get twice as
    many SATA disks for the same money. I had one server that had SAS
    because the salesman convinced me that the extra reliability of SAS
    drivers over SATA drivers was worth the cost. That machine broke when
    the SAS controller card died, leaving me with a useless server and a
    disk that I could not read because everything else was SATA. Sometimes
    you learn by doing!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Peter Chant on Thu Dec 15 12:55:35 2016
    On 14/12/16 21:03, Peter Chant wrote:
    On 12/13/2016 09:17 AM, David Brown wrote:

    OK, so having done some reading up but not carried out any testing I
    think the following is a possible setup. Note I am using LUKS so I will >>>> add this extra layer to the mix. I have not noticed any significant
    performance difference with and without it.

    This makes me even more suspicious that you (the OP) really have an I/O
    problem, or have identified where it is.



    I've had periods where the disks have been solidly at 80-90% utalisation
    for many seconds yet the CPU has been lightly loaded.

    What is the computer doing at the time?

    Is this actually slowing down something that you are waiting for?

    It is /normal/ for a computer to run at maximum in some aspects, for
    some time. If you are transferring a large file, you /want/ the disks
    to be as close to 100% as possible. If you are doing a raid1 scrub, you
    /want/ the disks to be close to 100%, perhaps for hours at a time if the
    disks are big (but at low I/O priority so that other tasks can also run).

    So far, you have just told me that your system is working.



    Create raid 10 far 2 raid array with mdadm for example (need to check
    what metadata means!):
    mdadm --create /dev/md-ssd --level=10 --metadata=0.90 --raid-devices=2 >>>> --layout=f2 /dev/sda4 /dev/sdb

    That looks like you are raiding a partition on one device with the
    entire second device. Are you sure that is what you want?


    Well. If raiding SSD's there is some logic to this horrible looking asymetric setup. I'm using an SSD for the OS and that has plenty of
    free space. I've also the older smaller SSD it replaced. Although it
    looks messy I could free up some space on the current system SSD to use
    a partition for RAID and use that in combination with the old, currently unused SSD. That saves shelling out for another SSD if I want to RAID a pair. Anaesthetically it is horrid and obviously would impact the speed
    of OS access. But is it hardware I have so the monetary cost is no
    issue. The time and hair loss cost might not be so trivial.

    OK - my first thought was that it was a mistake (possibly just a missing character from a cut-and-paste). Now I see it is intentional.

    I am not convinced this will be a good thing - in fact, I am confident
    that it is a /bad/ thing. An old small SSD can easily be a /lot/ slower
    than a new one - typically, old and small SSDs have poor garbage
    collection and little over-provisioning. This means they get very slow
    at writes when they are full - even slower, sometimes, than hard disks.
    When you create an md raid1 array like this, the first thing md will do
    is copy block-for-block from /dev/sda4 to /dev/sdb. It will write to
    the entire disk - the old SSD will think that /all/ its normal blocks
    are full of important data. As soon as you try to write something else
    to it, it must now try to do garbage collection in "panic" mode with
    minimal free space - you might find your write latencies measured in
    /seconds/. And if the SSD is old enough, then it won't be able to
    handle reads during the erases and garbage collection. If you are
    lucky, reads will come from the other disk. If you are unlucky, reads
    will be stalled too.

    So I would expect your system to be a good deal slower by doing this,
    compared to simply using the new SSD on its own.

    /If/ your old SSD is reasonably fast, you can first run a secure erase
    to tell it to drop all data. Then partition it, leaving a little extra
    space unused - this overprovisioning can help a lot. And use it with
    btrfs raid1, not md raid1, so that only the actual useful data is
    replicated.


    Would image the SSD in case of mess ups before resizing partitions.

    As for the whole of /dev/sdb - I've been using btrfs for a while and it
    is normal to give it whole disks, a simple slip of the finger.

    I always use partitions, but I usually want a couple of partitions for
    other things (like swap).


    I don't think anything has been said about the sizes and partitioning of
    the SSD. For smaller or cheaper SSDs, it is worth leaving a small
    amount of unpartitioned space at the end of the disk to give it more
    flexibility in garbage collection. (Do a secure erase before
    partitioning if it is not a new clean SSD.)


    Have done that already.

    Including the unused space at the end?



    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    Use crypt setup to create an encrypted block device on top of this raid >>>> 10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2 >>>> device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.

    Neither is better, IMHO, unless you have some reason to be seriously
    paranoid. It is understandable why one would want to encrypt a portable
    machine that you use a lot while travelling, but a home desktop? Think
    about whether encryption here is really a useful thing - adding layers
    of complexity does not make anything faster, and it makes it a whole lot
    more difficult if something goes wrong or if you want to recover your
    files from a different system.


    Well, in this day and age it seemed like a reasonable idea.

    It can be fun to play around with this sort of thing, but that doesn't
    mean it is a good idea if you are aiming for a useful system.

    One particular thing that can be a serious pain with such complex setups
    is if something goes wrong. If something breaks badly, you might have
    to put the disks in a different machine to recover the data, or boot the
    same machine from a live USB. If you have a lvm cache or bcache in
    writeback mode and you have both the SSD and the HD online, and you may
    need a system with the right kernel and utilities in order to get the
    disks working. Even then, it's easy to accidentally corrupt things
    along the way such as by writing to the HD while there are uncommitted
    changes on the SSD part. I have enough experience with complicated
    recoveries to know that when something goes wrong, you'll be glad you
    kept things simple - and that you documented everything :-)



    Create a logical volume on top of the above to use as the cache device. >>>
    Yep, if you use lvmcache, maybe bcache can do as well.

    Again, that's unnecessary complexity for a system like this. The
    benefits of lvmcache and bcache are debatable even for loads that match
    them.


    So it is about as fast as it will get now?

    I've a spare hdd and ssd so I'll have a play when I get time. Need to
    think about a useful benchmark.

    It's all about the type of load you are using. lvmcache and/or bcache
    can help a lot in some cases - but be no faster than a single HD in
    other cases. And for many of the things that /do/ run faster with the
    caches, you could just as easily run them entirely from the SSD
    significantly faster - or get even better results by adding more RAM.

    The key to getting /really/ optimal systems is, as you say, useful
    benchmarks. There is no benefit in looking at the timings of a test
    that writes lots of small blocks from lots of threads unless that really
    is what you are doing in practice. There is no benefit in running a
    benchmark after clearing your caches, because you don't clear your
    caches in practice - but if you run without clearing the caches, you are
    not testing your disk performance. The only "true" benchmark is to run
    your system with your typical real tasks and see how it performs.





    btrfs does not have raid10, and it does not share significant raid1 or
    raid0 code with md. (It /does/ share code for calculating raid5 and
    raid6 parities, but that's another issue - and don't use btrfs raid5/6
    until the bugs are sorted out!).

    Oh yes it does. :-). But I don't see anything about far 2.

    Btrfs "raid10" is (roughly) traditional raid1+0, where you have two
    mirrors which are striped. (I say roughly, because btrfs handles this
    at the level of extents, rather than raw disk blocks.) Linux md raid10
    is quite a bit more sophisticated than just layered raid1 and raid0 -
    which is why it works on any number of disks, not just multiples of 4.



    You only want the raid1 at one level. Your choice is raid1 on btrfs for
    best performance and efficiency (since only the actual useful data is
    mirrored, rather than the entire raw disk), or raid10,far on the md
    layer (for greater large file streaming read speed). This can be a big
    issue with SSDs - with btrfs raid1 you avoid initially copying over an
    entire diskful of data from one device to the other.

    replacing a disk and the associated balance took a week.

    If you have a lot of data, it takes a while to copy it all over. And a re-balance actually does a good deal more work than just a plain copy.
    But at least it doesn't copy the unused space too.



    I'm at RAID1 with btrfs now. Yes, not RAIDing btrfs over another RAID
    as that make little sense here.


    Of course, I could have got completely the wrong idea above and invented >>>> some horrid monster!

    If I understood it right, it sounds OK to me.

    I would, in any case, strongly suggest to experiment,
    maybe, as mentioned above, with loop devices.

    Agreed.

    Spare hdd and ssd. Could use loop devices but perhaps real hw is useful, though not enough to RAID.

    Loop devices work well in testing, especially for seeing how to add
    disks, replace disks, resize things, etc. Of course they are of little
    help in speed testing.

    I usually testing using loop devices made in a tmpfs mount - but then,
    my machines normally have lots of ram.



    Not for performances, but for practising possible


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to David Brown on Thu Dec 15 22:50:48 2016
    On 12/15/2016 12:15 PM, David Brown wrote:


    I think my lamp stack is somewhat atypical. I'm storing numerical data
    in it and doing some calcs on that. Might be good for storage but calcs
    in python and php are not optimal. However, this is partly historic and
    partly convenience and rework would be a major pita.

    For Python, you can look at numpy and scipy for serious calculations -
    if you can work with your data as homogeneous arrays then numpy will do
    the calculations in fast C libraries rather than interpreted Python.
    Also look at pypy or psyco as ways to speed up Python code. You may
    also find that if you have heavy Python pages you are better using
    Twisted as a webserver rather than Apache so that you work entirely
    within the one Python process rather than starting and stopping
    processes all the time.


    At this stage the rewrite is probally not worthwhile. I'm not sure this
    bit is necessarily the bottleneck anyway. Investigation required.



    Given that there is little free now the numbers I quoted earlier might
    not have been representative. I did not see a noticeable difference
    between 4 & 8 GB therefore I'd not considered more ram. However, if it
    is really likely to make a big difference and with 16GB of DDR2 going
    for between £15 and £45 on ebay then some research is warranted.

    I have seen extra ram make an impressive difference to speed. Not long
    ago a fellow developer here thought he needed a new graphics card at
    about £500 because his current £300 one was too slow for the 3D
    rendering he was doing. But £30 more ram doubled the speed of the
    system, while a new graphics card would have made little difference.


    The lack of noticable difference put me off. But a little careful
    ebaying might bump me to 16 for a reasonable price.


    It is often more efficient to have /tmp on tmpfs and let it spill out
    into swap, than to have the /tmp directly on the disk.


    That is good to know for the odd occasion. Hmm. I've just noticed a
    tiny swap hit with 7% free physical memory as I write this. First ever,
    that I have noticed!


    A newer cpu and motherboard may not seem useful from the viewpoint of processor power, but they will have better throughput on the I/O and
    faster native SATA.


    I've a cheap as chips PCIe SATA 3 controller waiting to go in. I'm
    curious as to weather it would make a difference to the HDDs, it should
    not. Anyway, it should help the SSD or SSDs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Peter Chant on Fri Dec 16 09:39:40 2016
    On 15/12/16 23:50, Peter Chant wrote:
    On 12/15/2016 12:15 PM, David Brown wrote:


    I think my lamp stack is somewhat atypical. I'm storing numerical data
    in it and doing some calcs on that. Might be good for storage but calcs >>> in python and php are not optimal. However, this is partly historic and >>> partly convenience and rework would be a major pita.

    For Python, you can look at numpy and scipy for serious calculations -
    if you can work with your data as homogeneous arrays then numpy will do
    the calculations in fast C libraries rather than interpreted Python.
    Also look at pypy or psyco as ways to speed up Python code. You may
    also find that if you have heavy Python pages you are better using
    Twisted as a webserver rather than Apache so that you work entirely
    within the one Python process rather than starting and stopping
    processes all the time.


    At this stage the rewrite is probally not worthwhile. I'm not sure this
    bit is necessarily the bottleneck anyway. Investigation required.

    If you are using 32-bit Python 2, psyco can be a great solution.
    Install it, and then add this to your Python script:

    import psyco
    psyco.full()

    That's all you need, and psyco will do JIT compilation of functions that
    are run often.

    As a project, psyco is now "dead" - PyPy is the successor for newer
    Python systems. But it can be a very easy way to speed up some kinds of
    code quite significantly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Fri Dec 16 09:34:18 2016
    On 16/12/16 00:18, Peter Chant wrote:
    <snip>

    I am not going to go through this answering point for point - it would
    be too specific and require too many questions and answers about exact
    details. I'll make a few more comments below - and I have read the bits
    that I have snipped, even if I didn't reply to them. If you want to
    take any of these points further, that's fine - but we can snip the
    other parts to keep the posts of manageable length.

    It is good that you are willing to experiment and try new arrangements
    here - getting an "optimal" system is not easy. And of course, playing
    around and learning about the options is fun! I hope I have been able
    to give you a few ideas about some possibilities you were not aware of,
    some pointers about things you need to consider, and some technical
    details to help you understand some of the options.

    And hopefully at least some of it has been interesting to other people
    in the group!

    If you are looking for more help or information about raid, the Linux
    raid mailing list is a good place. It has an emphasis on md raid, but
    covers other types of raid too.

    On 12/15/2016 11:55 AM, David Brown wrote:
    On 14/12/16 21:03, Peter Chant wrote:
    On 12/13/2016 09:17 AM, David Brown wrote:



    <snip>

    Well, the old SSD is not that old its a Samsung 840 Pro. Anyway. I have
    it and am not otherwise using it and the same goes for a 2TB HDD. So I
    can play with caching (but not necessarily RAID). So I can add them to
    the machine and have a go. However, this is much larger than the
    mariadb files and some others so I can try using it as a dedicated drive
    for some things and see if shifting some storage to it makes a
    difference. With more effort I can probably play with RAID and caching.


    Running directly from SSD is always faster and more efficient than any
    sort of caching system. So putting the files on the SSD is a good idea.

    Here are a few other pointers to consider and investigate:

    1. Mariadb/MySql supports a number of different table formats. Are the
    formats you are using the best for the job?

    2. Some formats need update-in-place files for efficiency. Btrfs uses copy-on-write, which can be slow and inefficient if the application
    expects to change parts of a file.

    3. RAID0 only helps speed if you have large files. For small files and
    random access, RAID0 slows things down - especially on hard disks.
    RAID1 can speed up multiple small reads in parallel, but slows down
    small writes a bit.

    4. You can choose where to store your database files, regardless of how
    the rest of your system is organised. One simple way is to use symbolic
    links for directories. So if your database files are in /var/mariadb,
    and you want them to be on a disk that is mounted /mnt/disk2, make
    /var/mariadb a symbolic link pointing to /mnt/disk2/mariadb.

    5. Experiment with the database files on a tmpfs. This will give you an
    idea of the maximum speeds possible, so that you know you are not aiming unrealistically high or dealing with something completely different as
    the bottleneck in the system.

    I've also looked up mysql optimisation and xfs is suggested. So I can
    make a dedicated xfs partition and see what happens. With or without LVM...

    xfs makes sense on /big/ systems. It makes little sense on small
    severs. The usual disk setup for a high performance xfs system is to
    organise your disks in raid1 pairs, then make a linear stripe (/not/ a
    raid0) on top of that for xfs. The aim is to give low-latency parallel
    access from a large number of processes - think media bank, large file
    server, mail server, big database server, etc. If you don't have at
    least 4 or 6 identical drives, your server task is probably not big
    enough to benefit much from xfs.


    There seem to be two schools of thought on RAID1 here. Anyway, I can
    play as far as I like provided I don't waste time or money.

    You are going to be wasting time in this - but it is all experience and
    part of the learning process.

    <snip>



    /If/ your old SSD is reasonably fast, you can first run a secure erase
    to tell it to drop all data. Then partition it, leaving a little extra
    space unused - this overprovisioning can help a lot. And use it with
    btrfs raid1, not md raid1, so that only the actual useful data is
    replicated.


    Think I'll boot on a USB stick and unplug all other drives when trying that.


    <snip>


    The system is on a simple partition with btrfs in single. But boot is
    ext3 or 4. And there is a partition I consider the 'maintenance'
    partition with a full slackware install on ext3 or 4 that I hardly touch
    but is handy in case I break something. Also you can't build a kernel
    from a rescue disk but you can with this which is nice.

    You can reasonably argue that it is a waste of space especially on an
    SSD which is expensive compared to a HDD but it is a nice comfort
    blanket and doubly so on the rare occasions it is needed.

    Agreed.

    <snip>

    If you have a lot of data, it takes a while to copy it all over. And a
    re-balance actually does a good deal more work than just a plain copy.
    But at least it doesn't copy the unused space too.


    Ues, there is the checksumming for a start. But that is a good thing.


    The checksumming is not very processor intensive. But it is not
    particularly useful either - so-called "bit rot" and "undetected read
    errors" are extraordinarily rare (they are very different from
    "unrecoverable read errors", for which unchecksummed raid1 is good
    enough). The 32 bits of checksum are peanuts compared to the levels of checksum that already exist on the disk. The crc /could/ be useful for detecting possible duplicate extents, but I don't think it is easily
    accessed for that at the moment.

    <snip>





    Loop devices work well in testing, especially for seeing how to add
    disks, replace disks, resize things, etc. Of course they are of little
    help in speed testing.

    I usually testing using loop devices made in a tmpfs mount - but then,
    my machines normally have lots of ram.


    I'd never have guessed. :-)


    Anyway food for thought here from Piergorgio and yourself.




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Chant@21:1/5 to David Brown on Thu Dec 15 23:18:57 2016
    On 12/15/2016 11:55 AM, David Brown wrote:
    On 14/12/16 21:03, Peter Chant wrote:
    On 12/13/2016 09:17 AM, David Brown wrote:

    OK, so having done some reading up but not carried out any testing I >>>>> think the following is a possible setup. Note I am using LUKS so I will >>>>> add this extra layer to the mix. I have not noticed any significant >>>>> performance difference with and without it.

    This makes me even more suspicious that you (the OP) really have an I/O
    problem, or have identified where it is.



    I've had periods where the disks have been solidly at 80-90% utalisation
    for many seconds yet the CPU has been lightly loaded.

    What is the computer doing at the time?


    Can't remember specifically I'm afraid.


    Is this actually slowing down something that you are waiting for?


    It has annoyed me as I was doing other stuff.


    It is /normal/ for a computer to run at maximum in some aspects, for
    some time. If you are transferring a large file, you /want/ the disks
    to be as close to 100% as possible. If you are doing a raid1 scrub, you /want/ the disks to be close to 100%, perhaps for hours at a time if the disks are big (but at low I/O priority so that other tasks can also run).

    So far, you have just told me that your system is working.


    Well, yes. But when I've not been deduping etc so it is not expected it
    can get annoying.



    Well. If raiding SSD's there is some logic to this horrible looking
    asymetric setup. I'm using an SSD for the OS and that has plenty of
    free space. I've also the older smaller SSD it replaced. Although it
    looks messy I could free up some space on the current system SSD to use
    a partition for RAID and use that in combination with the old, currently
    unused SSD. That saves shelling out for another SSD if I want to RAID a
    pair. Anaesthetically it is horrid and obviously would impact the speed
    of OS access. But is it hardware I have so the monetary cost is no
    issue. The time and hair loss cost might not be so trivial.

    OK - my first thought was that it was a mistake (possibly just a missing character from a cut-and-paste). Now I see it is intentional.


    It was a crude example to see if I had understood the basic concept, so
    I was not certain I was 100%, just mainly right.

    I am not convinced this will be a good thing - in fact, I am confident
    that it is a /bad/ thing. An old small SSD can easily be a /lot/ slower
    than a new one - typically, old and small SSDs have poor garbage
    collection and little over-provisioning. This means they get very slow
    at writes when they are full - even slower, sometimes, than hard disks.
    When you create an md raid1 array like this, the first thing md will do
    is copy block-for-block from /dev/sda4 to /dev/sdb. It will write to
    the entire disk - the old SSD will think that /all/ its normal blocks
    are full of important data. As soon as you try to write something else
    to it, it must now try to do garbage collection in "panic" mode with
    minimal free space - you might find your write latencies measured in /seconds/. And if the SSD is old enough, then it won't be able to
    handle reads during the erases and garbage collection. If you are
    lucky, reads will come from the other disk. If you are unlucky, reads
    will be stalled too.

    So I would expect your system to be a good deal slower by doing this, compared to simply using the new SSD on its own.

    Well, the old SSD is not that old its a Samsung 840 Pro. Anyway. I have
    it and am not otherwise using it and the same goes for a 2TB HDD. So I
    can play with caching (but not necessarily RAID). So I can add them to
    the machine and have a go. However, this is much larger than the
    mariadb files and some others so I can try using it as a dedicated drive
    for some things and see if shifting some storage to it makes a
    difference. With more effort I can probably play with RAID and caching.

    I've also looked up mysql optimisation and xfs is suggested. So I can
    make a dedicated xfs partition and see what happens. With or without LVM...

    There seem to be two schools of thought on RAID1 here. Anyway, I can
    play as far as I like provided I don't waste time or money.


    /If/ your old SSD is reasonably fast, you can first run a secure erase
    to tell it to drop all data. Then partition it, leaving a little extra
    space unused - this overprovisioning can help a lot. And use it with
    btrfs raid1, not md raid1, so that only the actual useful data is
    replicated.


    Think I'll boot on a USB stick and unplug all other drives when trying that.



    Would image the SSD in case of mess ups before resizing partitions.

    As for the whole of /dev/sdb - I've been using btrfs for a while and it
    is normal to give it whole disks, a simple slip of the finger.

    I always use partitions, but I usually want a couple of partitions for
    other things (like swap).


    I don't think anything has been said about the sizes and partitioning of >>> the SSD. For smaller or cheaper SSDs, it is worth leaving a small
    amount of unpartitioned space at the end of the disk to give it more
    flexibility in garbage collection. (Do a secure erase before
    partitioning if it is not a new clean SSD.)


    Have done that already.

    Including the unused space at the end?

    Yes. (but not on the old drive so back to the secure erase step).




    Metadata 1.0, 1.1 and 1.2 are the new ones, use these,
    not the 0.90, which have less features.

    Maybe, not really sure, but partitioning /dev/sdb might
    be better.

    Use crypt setup to create an encrypted block device on top of this raid >>>>> 10 far 2 device.

    Use LVM to create a volume group on top of the encrypted raid 10 far 2 >>>>> device.

    Or the other way around.
    I'm not sure which is better, maybe your proposal.

    Neither is better, IMHO, unless you have some reason to be seriously
    paranoid. It is understandable why one would want to encrypt a portable >>> machine that you use a lot while travelling, but a home desktop? Think
    about whether encryption here is really a useful thing - adding layers
    of complexity does not make anything faster, and it makes it a whole lot >>> more difficult if something goes wrong or if you want to recover your
    files from a different system.


    Well, in this day and age it seemed like a reasonable idea.

    It can be fun to play around with this sort of thing, but that doesn't
    mean it is a good idea if you are aiming for a useful system.

    One particular thing that can be a serious pain with such complex setups
    is if something goes wrong. If something breaks badly, you might have
    to put the disks in a different machine to recover the data, or boot the
    same machine from a live USB. If you have a lvm cache or bcache in
    writeback mode and you have both the SSD and the HD online, and you may
    need a system with the right kernel and utilities in order to get the
    disks working. Even then, it's easy to accidentally corrupt things
    along the way such as by writing to the HD while there are uncommitted changes on the SSD part. I have enough experience with complicated recoveries to know that when something goes wrong, you'll be glad you
    kept things simple - and that you documented everything :-)


    The system is on a simple partition with btrfs in single. But boot is
    ext3 or 4. And there is a partition I consider the 'maintenance'
    partition with a full slackware install on ext3 or 4 that I hardly touch
    but is handy in case I break something. Also you can't build a kernel
    from a rescue disk but you can with this which is nice.

    You can reasonably argue that it is a waste of space especially on an
    SSD which is expensive compared to a HDD but it is a nice comfort
    blanket and doubly so on the rare occasions it is needed.





    Create a logical volume on top of the above to use as the cache device. >>>>
    Yep, if you use lvmcache, maybe bcache can do as well.

    Again, that's unnecessary complexity for a system like this. The
    benefits of lvmcache and bcache are debatable even for loads that match
    them.


    So it is about as fast as it will get now?

    I've a spare hdd and ssd so I'll have a play when I get time. Need to
    think about a useful benchmark.

    It's all about the type of load you are using. lvmcache and/or bcache
    can help a lot in some cases - but be no faster than a single HD in
    other cases. And for many of the things that /do/ run faster with the caches, you could just as easily run them entirely from the SSD
    significantly faster - or get even better results by adding more RAM.


    Well. I have the options. What I'm not going to do is by a blister
    pack of expensive terrabyte range SSD for everything.

    The key to getting /really/ optimal systems is, as you say, useful benchmarks. There is no benefit in looking at the timings of a test
    that writes lots of small blocks from lots of threads unless that really
    is what you are doing in practice. There is no benefit in running a benchmark after clearing your caches, because you don't clear your
    caches in practice - but if you run without clearing the caches, you are
    not testing your disk performance. The only "true" benchmark is to run
    your system with your typical real tasks and see how it performs.





    You only want the raid1 at one level. Your choice is raid1 on btrfs for >>> best performance and efficiency (since only the actual useful data is
    mirrored, rather than the entire raw disk), or raid10,far on the md
    layer (for greater large file streaming read speed). This can be a big
    issue with SSDs - with btrfs raid1 you avoid initially copying over an
    entire diskful of data from one device to the other.

    replacing a disk and the associated balance took a week.

    If you have a lot of data, it takes a while to copy it all over. And a re-balance actually does a good deal more work than just a plain copy.
    But at least it doesn't copy the unused space too.


    Ues, there is the checksumming for a start. But that is a good thing.





    Loop devices work well in testing, especially for seeing how to add
    disks, replace disks, resize things, etc. Of course they are of little
    help in speed testing.

    I usually testing using loop devices made in a tmpfs mount - but then,
    my machines normally have lots of ram.


    I'd never have guessed. :-)


    Anyway food for thought here from Piergorgio and yourself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)