Forum: >>> Magnum BBS <<<

CPU time for transcendental functions

From Robinn@21:1/5 to All on Fri Dec 15 09:59:56 2023

I got some old neural network code (written about 30 years ago).
It has several activation functions, which only change 2 lines, like so:

if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign

Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Steven G. Kargl@21:1/5 to Robinn on Fri Dec 15 04:22:13 2023

On Fri, 15 Dec 2023 09:59:56 +0800, Robinn wrote:

I got some old neural network code (written about 30 years ago).
It has several activation functions, which only change 2 lines, like so:

if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign

Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.

There is insufficient information to provide much help. First, what
compiler and operating system? Second, how did you do the timing?
Third, is there a minimum working example that others can profile?

--
steve

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Giorgio Pastore@21:1/5 to All on Fri Dec 22 15:37:52 2023

Il 15/12/23 05:22, Steven G. Kargl ha scritto:

On Fri, 15 Dec 2023 09:59:56 +0800, Robinn wrote:

I got some old neural network code (written about 30 years ago).
It has several activation functions, which only change 2 lines, like so:

if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
output(i,j) = 1.0/(1.0+EXP(-output(i,j))) ! sigmoid
slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
elseif (activation(1:2).eq.'TA') then
output(i,j) = TANH(output(i,j)) ! TANH
slope(i,j) = 1.0 - output(i,j)*output(i,j) ! TANH
elseif (activation(1:2).eq.'AR') then
y = output(i,j)
output(i,j) = ATAN(y) ! arctan
slope(i,j) = 1.0/(1.0 +y*y) ! arctan
elseif (activation(1:5).eq.'SOFTP') then
y = EXP(output(i,j))
output(i,j) = LOG(1.0+y) ! softplus
slope(i,j) = 1.0/(1.0+1.0/y) ! softplus
elseif (activation(1:5).eq.'SOFTS') then
y = output(i,j)
output(i,j) = y/(ABS(y)+1.0) ! softsign
slope(i,j) = 1.0/(1.0+ABS(y))**2 ! softsign

Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a
table lookup and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the
scales.

There is insufficient information to provide much help. First, what
compiler and operating system? Second, how did you do the timing?
Third, is there a minimum working example that others can profile?

Fourth, what were the numbers of timing.

Giorgio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Jahns@21:1/5 to Robinn on Tue Jan 30 09:40:22 2024

On 2023-12-15 02:59, Robinn wrote:

I got some old neural network code (written about 30 years ago).
It has several activation functions, which only change 2 lines, like so:

      if (activation(1:2).eq.'SI' .or. activation(1:2).eq.'LO') then
         output(i,j) = 1.0/(1.0+EXP(-output(i,j)))       ! sigmoid
         slope(i,j) = output(i,j) * (1.0 - output(i,j)) ! sigmoid
      elseif (activation(1:2).eq.'TA') then
         output(i,j) = TANH(output(i,j))                 ! TANH
         slope(i,j) = 1.0 - output(i,j)*output(i,j)     ! TANH
      elseif (activation(1:2).eq.'AR') then
         y = output(i,j)
         output(i,j) = ATAN(y)                           ! arctan
         slope(i,j) = 1.0/(1.0 +y*y)                  ! arctan
      elseif (activation(1:5).eq.'SOFTP') then
         y = EXP(output(i,j))
         output(i,j) = LOG(1.0+y)                        ! softplus
         slope(i,j) = 1.0/(1.0+1.0/y)               ! softplus
      elseif (activation(1:5).eq.'SOFTS') then
         y = output(i,j)
         output(i,j) = y/(ABS(y)+1.0)                    ! softsign
         slope(i,j) = 1.0/(1.0+ABS(y))**2             ! softsign

Now when running it, the tanh option is slowest, as expected.
But the sigmoid (using exp) is faster than softsign, which only needs
abs and simple arithmetic. How can this be? Even if exp is using a table lookup
and spline interpolation, I would think that is slower.
Softsign would have an extra divide, but I can't see that tipping the scales.

You perhaps are not aware that the string comparisons (for which most compilers call the strncmp function) you have in your conditionals are quite expensive on todays CPUs. I would recommend to use an INTEGER constant to make the switch.

Thomas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Tue Jan 28 06:40:34 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Tue Jan 28 04:48:36 2025
  from Sydney, Nsw via Telnet
- Guest
  Tue Jan 28 03:02:48 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Keyop
  Tue Jan 28 00:51:06 2025
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Tue Jan 28 00:50:25 2025
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Tue Jan 28 00:49:39 2025
  from Huddersfield, West Yorkshire via SSH
- Guest
  Mon Jan 27 22:26:23 2025
  from /bin/busybox Cat /proc/self/ex via Raw
- Bob Worm
  Mon Jan 27 21:08:12 2025
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	407
Nodes:	16 (2 / 14)
Uptime:	11:49:00
Calls:	8,554
Calls today:	6
Files:	13,219
Messages:	5,925,363

CPU time for transcendental functions

Who's Online

Recent Visitors

System Info