• Links to external Q/A pages

    From spectrum@21:1/5 to All on Sat Sep 18 00:19:21 2021
    Hello,

    I've come across this question on StackOverflow, which asks how
    Fortran compilers efficiently handles dummy array arguments that
    may have unit or non-unit strides (e.g., for assumed-shape arrays),
    without increasing code generation patterns(?) exponentially
    for getting performance.

    "How do Fortran compilers handle stride=1 arrays?" https://stackoverflow.com/questions/69218925/how-do-fortran-compilers-handle-stride-1-arrays

    (the main text of the above question)
    When arrays have stride equal to 1 (a very common case), it seems that code compiled
    for the generic case (any nonzero stride) will often be suboptimal.
    On the other hand, creating multiple versions of compiled functions and
    subroutines when some of the arguments are stride=1 scales exponentially
    with the number of arguments that are arrays.
    What is the approach taken by Fortran compilers such as GFortran and Intel Fortran here?

    # The comments in the Q/A page mention the CONTIGUOUS attribute, but
    I guess the point of the question is not how to ensure the contiguity of a dummy array (from
    the user side), but how compilers generate codes for dummy arrays that may or may not be contiguous.

    # I had a similar question before, but never asked it on the net, so just curious :)
    I will forward any suggested info to the Q/A page above, if necessary.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to spectrum on Sat Sep 18 01:00:44 2021
    On Saturday, September 18, 2021 at 12:19:23 AM UTC-7, spectrum wrote:

    I've come across this question on StackOverflow, which asks how
    Fortran compilers efficiently handles dummy array arguments that
    may have unit or non-unit strides (e.g., for assumed-shape arrays),
    without increasing code generation patterns(?) exponentially
    for getting performance.

    "How do Fortran compilers handle stride=1 arrays?" https://stackoverflow.com/questions/69218925/how-do-fortran-compilers-handle-stride-1-arrays

    I suspect one of the things that took assumed shape so long to come
    to Fortran is the efficiency of assumed size.

    The usual implementation of assumed shape is a descriptor with the stride for each subscript, such that one multiplies each subscript by the given stride, adds up the result, and uses that as an offset into the array. (Slight complication
    for 0 vs. 1 origin.)

    If you are looping through sequential elements of an assumed size array,
    or a stride 1 assumed shape array, you can optimize the subscript calculation.

    The question suggests that the called routine should special-case the
    stride 1 case when doing the call. It seems to me not so hard to test the stride in the called routine, at the appropriate time. I can neither confirm nor deny that any compilers do that.

    Many processors now can pipeline instruction processing, and especially
    overlap fixed point and floating point calculation. (That worked at least
    back to the 8087 in microprocessors, and to machines like the IBM 360/91
    in the mainframe days.)

    Otherwise, static allocated arrays, and assumed size dummy arrays,
    are very efficient. The lower efficiency of assumed shape is considered
    worth the small cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to spectrum on Sat Sep 18 09:17:53 2021
    spectrum <septcolor7@gmail.com> schrieb:
    Hello,

    I've come across this question on StackOverflow, which asks how
    Fortran compilers efficiently handles dummy array arguments that
    may have unit or non-unit strides (e.g., for assumed-shape arrays),
    without increasing code generation patterns(?) exponentially
    for getting performance.

    "How do Fortran compilers handle stride=1 arrays?" https://stackoverflow.com/questions/69218925/how-do-fortran-compilers-handle-stride-1-arrays

    (the main text of the above question)
    When arrays have stride equal to 1 (a very common case), it seems that code compiled
    for the generic case (any nonzero stride) will often be suboptimal.

    Yes.

    On the other hand, creating multiple versions of compiled functions and
    subroutines when some of the arguments are stride=1 scales exponentially

    Which is why gfortran, at least, does not do so.

    with the number of arguments that are arrays.
    What is the approach taken by Fortran compilers such as GFortran and Intel Fortran here?

    # The comments in the Q/A page mention the CONTIGUOUS attribute, but
    I guess the point of the question is not how to ensure the contiguity of a dummy array (from
    the user side), but how compilers generate codes for dummy arrays that may or may not be contiguous.

    Depends.

    If the dummy argument is CONTIGUOUS, then gfortran will insert
    code to conditionally do copy-in/copy out of the argument.

    # I had a similar question before, but never asked it on the net, so just curious :)
    I will forward any suggested info to the Q/A page above, if necessary.

    There would also be the option of generating two loops, one for stride=1
    and one for stride /= 1. There is a PR for gfortran about this, but
    so far it has not attracted a patch.

    If the compiler knows both the caller and the callee, it can
    inspect the caller and generate code for stride=1 if this is
    indeed the case. This can be done if both are in the same file,
    or for link-time optimization.

    An ALLOCATABLE dummy argument is also known to be contiguous,
    but the actual argument then cannot be an expression or
    an array slice.

    HTH.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Thomas Koenig on Sat Sep 18 08:03:44 2021
    On Saturday, September 18, 2021 at 2:17:56 AM UTC-7, Thomas Koenig wrote:
    spectrum <septc...@gmail.com> schrieb:

    (snip)
    "How do Fortran compilers handle stride=1 arrays?" https://stackoverflow.com/questions/69218925/how-do-fortran-compilers-handle-stride-1-arrays

    (snip)

    If the dummy argument is CONTIGUOUS, then gfortran will insert
    code to conditionally do copy-in/copy out of the argument.
    # I had a similar question before, but never asked it on the net, so just curious :)
    I will forward any suggested info to the Q/A page above, if necessary.
    There would also be the option of generating two loops, one for stride=1
    and one for stride /= 1. There is a PR for gfortran about this, but
    so far it has not attracted a patch.

    It gets more interesting if there is more than one argument.
    Maybe one loop if all have stride 1, and one loop if they don't.

    Otherwise, as noted in the question, if you have three arguments
    used in the loop, you could expand to 8 loops.

    Seems you could also extract the stride(s) and use them inside
    the loop. That is easier if you have lots of registers available.
    I suspect that works as well as two loops.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)