• hardware encryption,Re: hardware encryption

    From Ryutaroh Matsumoto@21:1/5 to All on Thu Jun 3 21:50:01 2021
    From: Ryutaroh Matsumoto <ryutaroh@ict.e.titech.ac.jp>
    Date: Fri, 04 Jun 2021 04:18:25 +0900 (JST)
    Note that openssl version is much older but it is bundled with Debian Bullseye.

    I installed openssl ver. 3 from Debian experimental,
    and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
    on the same hardware and kernel, as below. Interesting...
    I wonder if I should file a bug to Debian BTS...

    # openssl speed aes-128-cbc
    ...
    version: 3.0.0-alpha16
    built on: built on: Thu May 6 19:54:38 2021 UTC
    options:bn(64,64)
    compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-UqeSFN/openssl-3.0.0~~alpha16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_
    BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
    CPUINFO: OPENSSL_armcap=0x83
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 37858.56k 40995.79k 41736.44k 42339.69k 41984.00k 42350.33k

    # openssl speed -evp aes-128-cbc
    ...
    version: 3.0.0-alpha16
    built on: built on: Thu May 6 19:54:38 2021 UTC
    options:bn(64,64)
    compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-UqeSFN/openssl-3.0.0~~alpha16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_
    BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
    CPUINFO: OPENSSL_armcap=0x83
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    AES-128-CBC 38057.99k 41038.28k 41973.03k 41930.50k 42233.35k 42308.27k

    Best regards, Ryutaroh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ryutaroh Matsumoto@21:1/5 to All on Thu Jun 3 21:20:01 2021
    Your Rock64 is significantly faster than my RPi4B. I wonder how such a big difference appears.

    From: Diederik de Haas <didi.debian@cknow.org>
    Date: Thu, 03 Jun 2021 19:34:19 +0200,Thu, 03 Jun 2021 19:34:19 +0200

    $ openssl speed aes-128-cbc
    ...
    version: 3.0.0-alpha16
    built on: built on: Thu May 6 19:54:38 2021 UTC
    ...
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k

    $ openssl speed -evp aes-128-cbc
    ...
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k

    On my RPi4B I have:

    # openssl speed aes-128-cbc
    ...
    OpenSSL 1.1.1k 25 Mar 2021
    built on: Thu Mar 25 20:49:34 2021 UTC
    options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
    compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-YhzaKF/openssl-1.1.1k=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -
    DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k
    # openssl speed -evp aes-128-cbc
    ...
    OpenSSL 1.1.1k 25 Mar 2021
    built on: Thu Mar 25 20:49:34 2021 UTC
    options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
    compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-YhzaKF/openssl-1.1.1k=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -
    DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k


    Note that openssl version is much older but it is bundled with Debian Bullseye. Kernel version is upstream 5.10.39 with almost the same kernel compilation options with
    Debian RT kernel. CPU frequency is fixed to 1.5GHz by
    "cpupower frequency-set -g performance".

    Best regards, Ryutaroh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Diederik de Haas@21:1/5 to All on Thu Jun 3 23:15:33 2021
    To: ryutaroh@ict.e.titech.ac.jp (Ryutaroh Matsumoto)
    Copy: noloader@gmail.com

    On donderdag 3 juni 2021 21:18:25 CEST Ryutaroh Matsumoto wrote:
    Your Rock64 is significantly faster than my RPi4B. I wonder how such a big difference appears.

    IIRC where I read about the 10x speed improvement wrt crypto with
    ARM Crypto Extension is where I also read that Broadcom does NOT
    have a license to them

    From: Diederik de Haas <didi.debian@cknow.org>
    Date: Thu, 03 Jun 2021 19:34:19 +0200,Thu, 03 Jun 2021 19:34:19 +0200

    $ openssl speed aes-128-cbc
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k

    $ openssl speed -evp aes-128-cbc
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k

    On my RPi4B I have:
    # openssl speed aes-128-cbc
    OpenSSL 1.1.1k 25 Mar 2021
    built on: Thu Mar 25 20:49:34 2021 UTC
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k

    # openssl speed -evp aes-128-cbc
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k

    What I find the most odd is that with '-evp' your scores are much lower then without.
    The lack of ARM CE could explain why Rock64's scores are better then one otherwise would expect, even though the speedup is (still) lower then I expected.

    Note that openssl version is much older

    It appears that OpenSSL made a jump from 1.1.1X to 3.0.x.
    The most likely reason I had 3.0.0-alpha16 is a YOLO action by me whereby
    I upgrade almost everything to experimental (KDE from exp was intentional)

    Kernel version is upstream 5.10.39 with almost the same kernel
    compilation options with Debian RT kernel.

    You may want to verify whether the options enabled bc of bug #976635
    are enabled with your kernel as well.

    CPU frequency is fixed to 1.5GHz by "cpupower frequency-set -g performance".

    I still have to investigate what's possible on Rock64, but my main problem
    is heat and bc I have no cooling that causes problems.

    Best regards, Ryutaroh

    Cheers,
    Diederik
    -----BEGIN PGP SIGNATURE-----

    iHUEABYKAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYLlGdQAKCRDXblvOeH7b bh2QAQC9fsCXG/enKXHkcFKGiODXI+BFnvre6dbzVyU0jkeJ+AEA2OYiiFKkXJD/ MrwcgL3MmoqvuGD2BFTjydXixJ+FqQQ=
    =wwe2
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Diederik de Haas@21:1/5 to All on Thu Jun 3 23:19:29 2021
    To: noloader@gmail.com
    To: ryutaroh@ict.e.titech.ac.jp (Ryutaroh Matsumoto)

    On donderdag 3 juni 2021 21:40:04 CEST Ryutaroh Matsumoto wrote:
    Note that openssl version is much older but it is bundled with Debian Bullseye.
    I installed openssl ver. 3 from Debian experimental,
    and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
    on the same hardware and kernel, as below. Interesting...
    I wonder if I should file a bug to Debian BTS...

    Interesting.
    I should probably test with version 1.1.1 too (but likely not tonight).

    Upstream seems a better place, but it's ofc useful to also track it in
    Debian's BTS.

    Cheers,
    Diederik
    -----BEGIN PGP SIGNATURE-----

    iHUEABYKAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYLlHYQAKCRDXblvOeH7b bsl9AP9y43Jvv/kpsVrKvpkikUSub6Zy0BoqDSbfYrJ2UxCZhAD/VC0AYs601PQa 3wODREwIGjtCxyC9g6VtjSXlHggLQAo=
    =VkZl
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeffrey Walton@21:1/5 to All on Fri Jun 4 14:10:01 2021
    https://openwrt.org/docs/techref/hardware/cryptographic.hardware.accelerators#finding_out_what_s_available_in_the_kernel
    is the only page I found wrt /proc/crypto and I do indeed have
    several 'skcipher' and 'shash' nodes with prio >= 300.
    That article also speaks about /dev/crypto, but I don't have that.

    Yeah, kernel crypto is not well documented (in my opinion).

    So there's a reasonable chance I indeed do have HW accelerated crypto,
    but it doesn't seem to be near '10x' speed improvements.

    Yeah, you won't see that kind of speedup across all agorithms.

    On Aarch64, you will see the following speedups (give or take) over a
    quality C implementation:

    * AES - 6x
    * SHA1 - 3.5x
    * SHA2 - 9.5x
    * PMULL - 12x

    SHA3 is available on ARMv8.2. Apple M1's ship with it. I don't have
    benchmark numbers for it (yet).

    Thermal issues may also play a role. I noticed that if I did a test
    after letting the device idle for a while, so it can cool off (?), did result in higher scores.

    You should probably use an active cooling solution, like a fan.

    Then, before your run benchmarks, move the CPU from standby mode to
    performance mode. See, for example, https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh.

    Standby mode is a kind of slow start. When in standby mode the first
    couple of algorithms you benchmark will be off. By about the third
    algorithm, the cpu is no longer in standby mode.

    By using performance mode you avoid the slow start that throws off
    your benchmarks.

    Jeff

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Diederik de Haas@21:1/5 to All on Fri Jun 4 13:39:18 2021
    To: ryutaroh@ict.e.titech.ac.jp (Ryutaroh Matsumoto)
    Copy: noloader@gmail.com

    [Note that I've combined the output of several posts for comparison]
    [As a result I've also made several edits for consistency/readability]

    On donderdag 3 juni 2021 21:40:04 CEST Ryutaroh Matsumoto wrote:
    From: Ryutaroh Matsumoto <ryutaroh@ict.e.titech.ac.jp>
    Note that openssl version is much older ... with Debian Bullseye.
    I installed openssl ver. 3 from Debian experimental,
    and observed much slower speed than ver. 1.1.1 in Debian Bullseye,
    on the same hardware and kernel, as below. Interesting...

    # openssl speed aes-128-cbc
    OpenSSL 1.1.1k 25 Mar 2021
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192bytes 16384 bytes
    aes-128 cbc 73719.58k 78001.25k 79918.46k 79520.45k 78646.02k 79442.42k
    for easy comparison, I'm adding *your* 3.0.0-alpha16 scores directly below aes-128-cbc 37858.56k 40995.79k 41736.44k 42339.69k 41984.00k 42350.33k

    # openssl speed -evp aes-128-cbc
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192bytes 16384 bytes
    aes-128-cbc 37975.41k 40705.82k 41937.97k 42066.56k 42265.07k 42382.97k
    for easy comparison, I'm adding *your* 3.0.0-alpha16 scores directly below AES-128-CBC 38057.99k 41038.28k 41973.03k 41930.50k 42233.35k 42308.27k

    $ openssl speed aes-128-cbc
    OpenSSL 1.1.1k 25 Mar 2021
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128 cbc 44008.81k 51444.78k 53902.17k 54553.60k 54730.75k 54717.10k
    for easy comparison, I'm adding my 3.0.0-alpha16 scores directly below
    aes-128-cbc 84716.70k 269243.61k 584986.37k 830015.83k 944873.47k 953417.73k

    $ openssl speed -evp aes-128-cbc
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
    aes-128-cbc 84829.88k 269672.83k 575085.57k 836608.00k 963663.19k 974023.34k
    for easy comparison, I'm adding my 3.0.0-alpha16 scores directly below
    AES-128-CBC 95904.58k 297023.53k 611697.15k 855083.69k 966412.97k 956033.71k


    This is indeed *quite* interesting!
    In your case, the test with OpenSSL 1.1.1k without '-evp' stands (far) out
    from your other score in a positive way.
    In my case, it is the same combo that stands out in a negative way.


    https://openwrt.org/docs/techref/hardware/cryptographic.hardware.accelerators#finding_out_what_s_available_in_the_kernel
    is the only page I found wrt /proc/crypto and I do indeed have
    several 'skcipher' and 'shash' nodes with prio >= 300.
    That article also speaks about /dev/crypto, but I don't have that.

    So there's a reasonable chance I indeed do have HW accelerated crypto,
    but it doesn't seem to be near '10x' speed improvements.

    Thermal issues may also play a role. I noticed that if I did a test
    after letting the device idle for a while, so it can cool off (?), did result in higher scores. The Rock64 tends to get (quite) hot pretty quickly
    and that _may_ mean it throttles back quickly. I haven't done any
    measuring, so I may be completely off on this.
    I'm pretty sure the RPi (4) has had more man-power devoted to this issue.

    Cheers,
    Diederik
    -----BEGIN PGP SIGNATURE-----

    iHUEABYKAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYLoQ5gAKCRDXblvOeH7b bob4AQCc6P4wYFFAjcXcQ2kebGS2IYhjdlosKPwzsLWr294pzQEAtrTNY7byUEIB 4oluPTp9ZmnRaoEW/AwPWJ5oHGpwDwE=
    =kk2Z
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Diederik de Haas@21:1/5 to All on Fri Jun 4 22:05:38 2021
    To: noloader@gmail.com

    On vrijdag 4 juni 2021 14:05:28 CEST Jeffrey Walton wrote:
    Yeah, kernel crypto is not well documented (in my opinion).

    It's a complicated subject with various nuances and if you're not "in the know", like I am, it's very hard to f.e. qualify/quantify whether having "ARM CE" is just nice marketing or it is actually substantial.
    It could be that the rock64 achieves '6x', but without knowing/understanding the baseline, that's mostly still meaningless.
    F.e. I had expected a greater difference with RPi4 bc it's Broadcom, but if the base is so much lower, then what I get is still 'good' (but relatively).

    So there's a reasonable chance I indeed do have HW accelerated crypto,
    but it doesn't seem to be near '10x' speed improvements.

    Yeah, you won't see that kind of speedup across all agorithms.

    On Aarch64, you will see the following speedups (give or take) over a
    quality C implementation:

    * AES - 6x
    * SHA1 - 3.5x
    * SHA2 - 9.5x
    * PMULL - 12x

    Good to know, thanks.

    Thermal issues may also play a role. I noticed that if I did a test
    after letting the device idle for a while, so it can cool off (?), did result in higher scores.

    You should probably use an active cooling solution, like a fan.

    I asked around (on irc:Pine64:/#rock64) and there was one person that used active cooling, but that was custom made, which is beyond my skill set.
    There is an aluminum case available (in Pine64's store) whereby the whole case is practically a heat sink and that seems to work well, so I'll go for that.

    move the CPU from standby mode to performance mode.

    That's still a research area for me to see if and if yes, what could/needs to be done to keep the device from crashing (due to thermals), while also/still getting the most out of the device.
    Could take quite a while though. If needed would probably be part of a dedicated 'rock64' thread.

    Cheers,
    Diederik




    -----BEGIN PGP SIGNATURE-----

    iHUEABYKAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYLqHkgAKCRDXblvOeH7b bhfKAP43Y0YCHtyGZBmKFG4D9VX/mwYEPR4NOsYHhEbwlT+0kQD/cjjosjEfiqKI rS35YAI49wU2rAxDJ9DS306aT2eztgE=
    =r9Cg
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)