• Traffic shaping on debian

    From Aleksey@21:1/5 to All on Fri May 27 14:50:02 2016
    Hi guys!

    I have a debian box acting as a router and need a tool to perform
    traffic shaping based on source/destination IPs, interfaces, etc. I have
    tried the default tc, however, it uses plenty of resources, e.g. 600
    mbps without shaping flows through with 3% cpu load and the same 600mbps
    with shaping (tc using htb on egress interface) consumes something like
    40% cpu.

    Probably someone could advise some kind of a tool to do such shaping
    with minimum resources consumed - I've searched through the web and
    found a module named nf-hishape, however, I didn't manage to find some reasonably high number of articles about it as well as no manuals and so
    on - I guess it's not very popular (if it's actually alive).

    Any help would be appreciated.

    Thanks in advance.

    --
    With kind regards,
    Aleksey

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mihamina RAKOTOMANDIMBY@21:1/5 to Aleksey on Fri May 27 15:00:01 2016
    On 05/27/2016 02:40 PM, Aleksey wrote:
    Hi guys!

    I have a debian box acting as a router and need a tool to perform
    traffic shaping based on source/destination IPs, interfaces, etc. I
    have tried the default tc, however, it uses plenty of resources, e.g.
    600 mbps without shaping flows through with 3% cpu load and the same
    600mbps with shaping (tc using htb on egress interface) consumes
    something like 40% cpu.



    Would you share your configuration on some snippets?
    I mean, the script you use to invoke tc an so on...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry Sinina@21:1/5 to Aleksey on Fri May 27 15:20:02 2016
    On 05/27/2016 02:40 PM, Aleksey wrote:
    Hi guys!

    I have a debian box acting as a router and need a tool to perform traffic shaping based on source/destination IPs, interfaces, etc. I have tried the default tc, however, it uses plenty of resources, e.g. 600 mbps without shaping flows through with 3%
    cpu load and the same 600mbps with shaping (tc
    using htb on egress interface) consumes something like 40% cpu.

    Probably someone could advise some kind of a tool to do such shaping with minimum resources consumed - I've searched through the web and found a module named nf-hishape, however, I didn't manage to find some reasonably high number of articles about it
    as well as no manuals and so on - I guess it's
    not very popular (if it's actually alive).

    Any help would be appreciated.

    Thanks in advance.

    Hi.

    Seems you use flat list of filters. How many filters you have?
    Did you try hash tables for traffic classification?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From lxP@21:1/5 to Aleksey on Fri May 27 19:30:02 2016
    Hi,

    On 2016-05-27 15:50, Aleksey wrote:
    tc class add dev eth1 parent 1:1 classid 1:30 htb rate 1mbps ceil 1000mbps

    I have never measured the CPU usage, but I also noticed that htb ceil does not perform for me as expected. I could never get through the full ceil bandwidth, even if there is no other traffic.
    I would suggest you to try using a higher htb rate (e.g. 988mbit) and rerun your
    experiments.
    I started to avoid htb ceil in general and switched to fair queuing qdiscs like fq_codel, drr, sfq and so on when ever possible. However, that might not directly fit your needs.

    Best regards,
    lxP

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aleksey@21:1/5 to Dmitry Sinina on Fri May 27 17:00:02 2016
    On 2016-05-27 14:48, Dmitry Sinina wrote:
    On 05/27/2016 02:40 PM, Aleksey wrote:
    Hi guys!

    I have a debian box acting as a router and need a tool to perform
    traffic shaping based on source/destination IPs, interfaces, etc. I
    have tried the default tc, however, it uses plenty of resources, e.g.
    600 mbps without shaping flows through with 3% cpu load and the same
    600mbps with shaping (tc
    using htb on egress interface) consumes something like 40% cpu.

    Probably someone could advise some kind of a tool to do such shaping
    with minimum resources consumed - I've searched through the web and
    found a module named nf-hishape, however, I didn't manage to find some
    reasonably high number of articles about it as well as no manuals and
    so on - I guess it's
    not very popular (if it's actually alive).

    Any help would be appreciated.

    Thanks in advance.

    Hi.

    Seems you use flat list of filters. How many filters you have?
    Did you try hash tables for traffic classification?

    Hi.

    Practically, I haven't done any configuration on my production router -
    I have performed tests in lab environment. Configuration was pretty
    simple:

    tc qdisc add dev eth1 root handle 1: htb default 30
    tc class add dev eth1 parent 1: classid 1:1 htb rate 1000mbps ceil
    1000mbps
    tc class add dev eth1 parent 1:1 classid 1:10 htb rate 3mbps ceil 5mbps
    tc class add dev eth1 parent 1:1 classid 1:20 htb rate 5mbps ceil 7mbps
    tc class add dev eth1 parent 1:1 classid 1:30 htb rate 1mbps ceil
    1000mbps
    tc qdisc add dev eth1 parent 1:10 handle 10:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:20 handle 20:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:30 handle 30:0 sfq perturb 10
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dport
    443 0xffff flowid 1:20
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dport
    80 0xffff flowid 1:10

    So after applying it I tried to push some traffic through this lab box
    using iperf. When performing test on ports 80/443 (limited to low
    bandwidth) - CPU load was ok, however when I pushed unrestricted traffic
    (1000 mbps limit) I noticed high CPU usage. I tried setting up filters
    based on fwmark but the result was the same. I'm using debian 7 with
    3.16 kernel installed from wheezy-backports, if it is important.

    If some additional info (firewall config, etc) is needed, please ask.

    --
    With kind regards,
    Aleksey

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From lxP@21:1/5 to Ruben Wisniewski on Sat May 28 14:00:02 2016
    Since a few years, I am searching for a good solution for services of different priority, so I am really interested in how you would do that.

    On 2016-05-28 09:49, Ruben Wisniewski wrote:
    fq_codel should be used in any case, if you have more than one service use a fq_codel per service and shape them accordingly with a hard limit. Above all services queues add a root fq_codel and shape them to 92% of the total available
    bandwidth in total.

    Reducing a service under the physical bandwidth needed is mostly unintended or
    is a result of misunderstanding of the difference between average to peak bandwidth of network-applications and cost me in job a huge amount of time.

    A good start point:
    https://wiki.gentoo.org/wiki/Traffic_shaping


    Best regards Ruben

    I didn't fully understand what qdisc hierarchy you are suggesting. It sounds for
    me that you add fq_codel below an fq_codel queue, which is impossible, or?

    htb (hard limit)
    - ?
    -- fq_codel (service A)
    -- fq_codel (service B)
    -- fq_codel (service C)

    You could remove the question mark entirely and put the fq_codel queues directly
    below htb with a fixed hard limit.
    However, in most cases you don't want a hard limit, but just specify a rough priority between the services. If the link is unsaturated, each service should be able to use arbitrary bandwidth. However, if it approaches saturation the services should approach a specified percentage of the total bandwidth.
    The obvious solution would be to use "ceil", but as mentioned before it doesn't perform well.
    You could put drr for the question mark, which performs well, but it would just handle each service of equal priority.
    You could put prio for the question mark, but if service A uses all bandwidth, service B and C will starve.
    Does anyone have a solution for that problem?

    Best regards,
    lxP

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Kraus@21:1/5 to Aleksey on Sat May 28 18:30:02 2016
    On Fri, May 27, 2016 at 04:50:55PM +0300, Aleksey wrote:
    Practically, I haven't done any configuration on my production router - I have performed tests in lab environment. Configuration was pretty simple:

    tc qdisc add dev eth1 root handle 1: htb default 30
    tc class add dev eth1 parent 1: classid 1:1 htb rate 1000mbps ceil 1000mbps tc class add dev eth1 parent 1:1 classid 1:10 htb rate 3mbps ceil 5mbps
    tc class add dev eth1 parent 1:1 classid 1:20 htb rate 5mbps ceil 7mbps
    tc class add dev eth1 parent 1:1 classid 1:30 htb rate 1mbps ceil 1000mbps
    tc qdisc add dev eth1 parent 1:10 handle 10:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:20 handle 20:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:30 handle 30:0 sfq perturb 10
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dport 443 0xffff flowid 1:20
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dport 80 0xffff flowid 1:10

    I'd assume the problem is that when you bind htb directly to the root of a device you basically loose the multiqueue capability of an ethernet card because all packets must end in a single queue from which they are dispatched to the multiple queues of an ethernet card.
    mk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aleksey@21:1/5 to Martin Kraus on Mon May 30 14:00:02 2016
    On 2016-05-28 18:16, Martin Kraus wrote:
    On Fri, May 27, 2016 at 04:50:55PM +0300, Aleksey wrote:
    Practically, I haven't done any configuration on my production router
    - I
    have performed tests in lab environment. Configuration was pretty
    simple:

    tc qdisc add dev eth1 root handle 1: htb default 30
    tc class add dev eth1 parent 1: classid 1:1 htb rate 1000mbps ceil
    1000mbps
    tc class add dev eth1 parent 1:1 classid 1:10 htb rate 3mbps ceil
    5mbps
    tc class add dev eth1 parent 1:1 classid 1:20 htb rate 5mbps ceil
    7mbps
    tc class add dev eth1 parent 1:1 classid 1:30 htb rate 1mbps ceil
    1000mbps
    tc qdisc add dev eth1 parent 1:10 handle 10:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:20 handle 20:0 sfq perturb 10
    tc qdisc add dev eth1 parent 1:30 handle 30:0 sfq perturb 10
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip
    dport 443
    0xffff flowid 1:20
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip
    dport 80
    0xffff flowid 1:10

    I'd assume the problem is that when you bind htb directly to the root
    of a
    device you basically loose the multiqueue capability of an ethernet
    card
    because all packets must end in a single queue from which they are
    dispatched
    to the multiple queues of an ethernet card.
    mk

    Hi.

    I have also noticed that all the load is on one CPU core it is not
    distributed to all available cores. And how can this be avoided?


    to lxP:

    I'll try to rerun tests as you said and will report the results.

    --
    With kind regards,
    Aleksey

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Kraus@21:1/5 to Aleksey on Mon May 30 18:40:02 2016
    On Mon, May 30, 2016 at 01:55:51PM +0300, Aleksey wrote:
    I have also noticed that all the load is on one CPU core it is not distributed to all available cores. And how can this be avoided?

    There is a qdisc called mq which creates a class for each hardware queue on
    the attached ethernet card. You can bind other qdiscs (such as htb) to each of these classes but this will not allow you to shape traffic for a single
    type going out over all the hardware queues.

    It might be possible to have multiple htb qdiscs and use filters to send
    each type of traffic to a selected hardware queue. This has other adverse effects (such as not being able to borrow unused bandwidth among the hw
    queues) and there still might be lock contention among the cores for each such queue so it might not even work better.

    If you are at 1 Gbit speed the cpu can probably handle it so there is no need to do any of this. If you have a 10Gbit+ connection then this probably isn't the correct place to do shaping anyway and should be done closer to the source.

    It depends on what you're trying to accomplish.

    regards
    Martin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?B?UmHDumw=?= Alexis Betanco@21:1/5 to All on Wed Jun 1 12:30:02 2016
    So, yes, I have 10G uplinks. The main goal is to be able to shape
    traffic from certain hosts to the destinations that are reachable
    through local internet exchange and to all other destinations (world).
    Local IX is connected to one interface of my debian box and worldwide
    traffic flows through the another. The simpliest way to achieve this,
    for my opinion, was to apply egress qdiscs on there interfaces and apply filters and classes there also, so it would effectively shape as I need.
    The problem with shaping closer to the source is that I wouldn't be able
    to classify the traffic on switches - it's not only one or a couple of destinations, it's something like 30k destinations available through
    local IX.

    Probably you could point me to a better option.

    P.S. to lxP - increasing rate on the default htb class didn't help - probably, CPU usage could drop a couple percents lower (not sure,
    really) but is is definitely not significant.

    --
    With kind regards,
    Aleksey

    If you are trying to shape 10G links with an Debian box, apart from routing
    ... you will have to do lot tunning on the system.

    Look for 'High performance Linux routing' on google, you will find lot of articles explain the caveats you will face.

    When going over 1G links ... it's better to use dedicated hardware for rouiting and shaping, IMHO.

    Best regards
    (null)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aleksey@21:1/5 to Martin Kraus on Wed Jun 1 10:30:01 2016
    On 2016-05-30 18:34, Martin Kraus wrote:
    On Mon, May 30, 2016 at 01:55:51PM +0300, Aleksey wrote:
    I have also noticed that all the load is on one CPU core it is not
    distributed to all available cores. And how can this be avoided?

    There is a qdisc called mq which creates a class for each hardware
    queue on
    the attached ethernet card. You can bind other qdiscs (such as htb) to
    each of
    these classes but this will not allow you to shape traffic for a single
    type going out over all the hardware queues.

    It might be possible to have multiple htb qdiscs and use filters to
    send
    each type of traffic to a selected hardware queue. This has other
    adverse
    effects (such as not being able to borrow unused bandwidth among the hw queues) and there still might be lock contention among the cores for
    each such
    queue so it might not even work better.

    If you are at 1 Gbit speed the cpu can probably handle it so there is
    no need
    to do any of this. If you have a 10Gbit+ connection then this probably
    isn't
    the correct place to do shaping anyway and should be done closer to the source.

    It depends on what you're trying to accomplish.

    regards
    Martin

    So, yes, I have 10G uplinks. The main goal is to be able to shape
    traffic from certain hosts to the destinations that are reachable
    through local internet exchange and to all other destinations (world).
    Local IX is connected to one interface of my debian box and worldwide
    traffic flows through the another. The simpliest way to achieve this,
    for my opinion, was to apply egress qdiscs on there interfaces and apply filters and classes there also, so it would effectively shape as I need.
    The problem with shaping closer to the source is that I wouldn't be able
    to classify the traffic on switches - it's not only one or a couple of destinations, it's something like 30k destinations available through
    local IX.

    Probably you could point me to a better option.

    P.S. to lxP - increasing rate on the default htb class didn't help -
    probably, CPU usage could drop a couple percents lower (not sure,
    really) but is is definitely not significant.

    --
    With kind regards,
    Aleksey

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)