• Question on legality / implementation of Coarry construct

    From Thomas Koenig@21:1/5 to All on Thu Nov 18 22:05:41 2021
    I have a question regarding the legality of the following code,

    Is the code below OK? More specifically, how would an MPI-based
    implementation do this? The call to foo will result in a local
    allocation, but how this could be broadcast to all other images
    is not clear to me.

    module x
    implicit none
    type mytype
    integer :: x
    end type mytype

    type foobar
    type(mytype), allocatable:: xyz
    end type

    type(foobar):: arr[*]

    contains
    subroutine bar()
    if (mod(this_image(), 2) .eq. 1) then
    call foo(arr%xyz)
    else
    sync all
    print *, arr[this_image() - 1]%xyz%x
    end if
    end subroutine

    subroutine foo(a)
    type (mytype), intent(inout), allocatable :: a
    allocate (a)
    a%x = this_image()
    sync all
    end subroutine foo
    end module x

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rbader@21:1/5 to Thomas Koenig on Fri Nov 19 07:52:17 2021
    Thomas Koenig schrieb am Donnerstag, 18. November 2021 um 23:05:44 UTC+1:
    I have a question regarding the legality of the following code,

    Is the code below OK? More specifically, how would an MPI-based implementation do this? The call to foo will result in a local
    allocation, but how this could be broadcast to all other images
    is not clear to me.

    module x
    implicit none
    type mytype
    integer :: x
    end type mytype

    type foobar
    type(mytype), allocatable:: xyz
    end type

    type(foobar):: arr[*]

    contains
    subroutine bar()
    if (mod(this_image(), 2) .eq. 1) then
    call foo(arr%xyz)
    else
    sync all
    print *, arr[this_image() - 1]%xyz%x
    end if
    end subroutine

    subroutine foo(a)
    type (mytype), intent(inout), allocatable :: a
    allocate (a)
    a%x = this_image()
    sync all
    end subroutine foo
    end module x

    I believe this is conforming code (possibly modulo syntactic details, I haven't tried to compile it).
    Note that coarrays of a type with ALLOCATABLE or POINTER components are permitted, although in general
    communication performance for these components will incur additional latencies and other performance issues due to the nonsymmetric
    nature of such objects.
    I cannot specifically say how an MPI based implementation would need to do this, but suspect that for this usage active-target
    RMA (one-sided MPI_Get on arr[...]%xyz%x with an MPI_Fence as synchronization) would provide the needed properties.
    I do not understand your reference to a "broadcast" though. Your code has ~n/2 Gets on all even images to the respective neighbour.

    Regards
    Reinhold

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve Lionel@21:1/5 to Thomas Koenig on Fri Nov 19 11:46:55 2021
    On 11/18/2021 5:05 PM, Thomas Koenig wrote:
    I have a question regarding the legality of the following code,

    Is the code below OK? More specifically, how would an MPI-based implementation do this? The call to foo will result in a local
    allocation, but how this could be broadcast to all other images
    is not clear to me.

    That's because it isn't broadcast. When coarray arr becomes established,
    each image has its own copy of component xyz (unallocated), and it is
    the responsibility of each image to allocate that component (and
    synchronize) before it is referenced, either in that image or another image.

    --
    Steve Lionel
    ISO/IEC JTC1/SC22/WG5 (Fortran) Convenor
    Retired Intel Fortran developer/support
    Email: firstname at firstnamelastname dot com
    Twitter: @DoctorFortran
    LinkedIn: https://www.linkedin.com/in/stevelionel
    Blog: https://stevelionel.com/drfortran
    WG5: https://wg5-fortran.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to rbader on Fri Nov 19 17:10:11 2021
    rbader <Bader@lrz.de> schrieb:
    Thomas Koenig schrieb am Donnerstag, 18. November 2021 um 23:05:44 UTC+1:
    I have a question regarding the legality of the following code,

    Is the code below OK? More specifically, how would an MPI-based
    implementation do this? The call to foo will result in a local
    allocation, but how this could be broadcast to all other images
    is not clear to me.

    module x
    implicit none
    type mytype
    integer :: x
    end type mytype

    type foobar
    type(mytype), allocatable:: xyz
    end type

    type(foobar):: arr[*]

    contains
    subroutine bar()
    if (mod(this_image(), 2) .eq. 1) then
    call foo(arr%xyz)
    else
    sync all
    print *, arr[this_image() - 1]%xyz%x
    end if
    end subroutine

    subroutine foo(a)
    type (mytype), intent(inout), allocatable :: a
    allocate (a)
    a%x = this_image()
    sync all
    end subroutine foo
    end module x

    I believe this is conforming code (possibly modulo syntactic
    details, I haven't tried to compile it).

    It does the expected thing with NAG, at least, and NAG being
    picky, I suspect this is OK.

    Note that coarrays
    of a type with ALLOCATABLE or POINTER components are permitted,
    although in general communication performance for these components
    will incur additional latencies and other performance issues due
    to the nonsymmetric nature of such objects.

    OK.

    I cannot specifically say how an MPI based implementation would
    need to do this, but suspect that for this usage active-target
    RMA (one-sided MPI_Get on arr[...]%xyz%x with an MPI_Fence as synchronization) would provide the needed properties. I do not
    understand your reference to a "broadcast" though. Your code has
    ~n/2 Gets on all even images to the respective neighbour.

    Sorry, wrong terminology about broadcast. Also, the question
    regarding MPI was not really what I was after, some explanation
    is in order.

    The reason why I was asking is a design issue that we are currently
    facing desiging the shared memory coarray implementation for gfortran.

    To avoid all sorts of problems, the basic design allocates a large
    shared memory segment and then does fork(). All coarrays are then
    put into that segment, also all locally allocated allocatables
    and pointers (like in the code above).

    If code that knows nothing about coarrays allocates or frees
    memory, it will (up to now) simply call malloc(), which will
    not work. So, we are in discussion how to solve it, preferably
    without an ABI change (or replacing malloc() from under
    the library).

    Hmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From FortranFan@21:1/5 to Thomas Koenig on Fri Nov 19 13:31:23 2021
    On Friday, November 19, 2021 at 12:10:14 PM UTC-5, Thomas Koenig wrote:

    ..
    The reason why I was asking is a design issue that we are currently
    facing desiging the shared memory coarray implementation for gfortran. ..

    To @Thomas Koenig and all the GCC/gfortran volunteers involved with this,

    Will it be possible for you to announce this effort toward shared memory coarray implementation at the Fortran Discourse site?
    https://fortran-lang.discourse.group/

    There are many users and fans of gfortran who use the Fortran Discourse site and there are those appear to be generally unaware of the efforts with gfortran.

    By better broadcasting of your effort, you will likely notice greater energy and encouragement and appreciation toward gfortran. You may even start to seed more volunteers among the user base who start to contribute more toward the gfortran development
    efforts.

    Thanks,

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rbader@21:1/5 to Thomas Koenig on Fri Nov 19 13:37:23 2021
    Thomas Koenig schrieb am Freitag, 19. November 2021 um 18:10:14 UTC+1:
    rbader <Ba...@lrz.de> schrieb:
    Thomas Koenig schrieb am Donnerstag, 18. November 2021 um 23:05:44 UTC+1:
    I have a question regarding the legality of the following code,

    Is the code below OK? More specifically, how would an MPI-based
    implementation do this? The call to foo will result in a local
    allocation, but how this could be broadcast to all other images
    is not clear to me.

    module x
    implicit none
    type mytype
    integer :: x
    end type mytype

    type foobar
    type(mytype), allocatable:: xyz
    end type

    type(foobar):: arr[*]

    contains
    subroutine bar()
    if (mod(this_image(), 2) .eq. 1) then
    call foo(arr%xyz)
    else
    sync all
    print *, arr[this_image() - 1]%xyz%x
    end if
    end subroutine

    subroutine foo(a)
    type (mytype), intent(inout), allocatable :: a
    allocate (a)
    a%x = this_image()
    sync all
    end subroutine foo
    end module x

    I believe this is conforming code (possibly modulo syntactic
    details, I haven't tried to compile it).
    It does the expected thing with NAG, at least, and NAG being
    picky, I suspect this is OK.
    Note that coarrays
    of a type with ALLOCATABLE or POINTER components are permitted,
    although in general communication performance for these components
    will incur additional latencies and other performance issues due
    to the nonsymmetric nature of such objects.
    OK.
    I cannot specifically say how an MPI based implementation would
    need to do this, but suspect that for this usage active-target
    RMA (one-sided MPI_Get on arr[...]%xyz%x with an MPI_Fence as synchronization) would provide the needed properties. I do not
    understand your reference to a "broadcast" though. Your code has
    ~n/2 Gets on all even images to the respective neighbour.
    Sorry, wrong terminology about broadcast. Also, the question
    regarding MPI was not really what I was after, some explanation
    is in order.

    The reason why I was asking is a design issue that we are currently
    facing desiging the shared memory coarray implementation for gfortran.

    To avoid all sorts of problems, the basic design allocates a large
    shared memory segment and then does fork(). All coarrays are then
    put into that segment, also all locally allocated allocatables
    and pointers (like in the code above).

    If code that knows nothing about coarrays allocates or frees
    memory, it will (up to now) simply call malloc(), which will
    not work. So, we are in discussion how to solve it, preferably
    without an ABI change (or replacing malloc() from under
    the library).

    This indeed sounds tricky. In particular with POINTER components, there is no way
    for the target (which might be a static variable, or dynamically allocated) to tell
    whether it will get used in a coarray context later on.



    Hmm...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to rbader on Fri Nov 19 23:39:57 2021
    rbader <Bader@lrz.de> schrieb:
    Thomas Koenig schrieb am Freitag, 19. November 2021 um 18:10:14 UTC+1:

    If code that knows nothing about coarrays allocates or frees
    memory, it will (up to now) simply call malloc(), which will
    not work. So, we are in discussion how to solve it, preferably
    without an ABI change (or replacing malloc() from under
    the library).

    This indeed sounds tricky. In particular with POINTER components, there is no way
    for the target (which might be a static variable, or dynamically allocated) to tell
    whether it will get used in a coarray context later on.

    That is indeed a very tricky case, especially if
    the pointer is to a local variable.

    So, this should work, then:

    module x
    implicit none
    type a_t
    integer, pointer :: ip
    end type a_t
    contains
    subroutine foo(a)
    type (a_t) :: a[*]
    integer, target :: i
    i = this_image();
    if (mod(i,2) == 1) then
    a%ip => i
    sync all
    else
    sync all
    print *,a[i-1]%ip
    end if
    sync all
    end subroutine foo
    end module x

    program y
    use x
    type (a_t) :: a[*]
    call foo(a)
    end program y

    (the sync all before the end of the subroutine to make
    sure that foo is still executing and a%ip is still pointing
    to something valid).

    This is indeed a tough nut to crack.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From FortranFan@21:1/5 to Thomas Koenig on Fri Nov 19 16:27:25 2021
    On Friday, November 19, 2021 at 6:39:59 PM UTC-5, Thomas Koenig wrote:

    ..
    (the sync all before the end of the subroutine to make
    sure that foo is still executing and a%ip is still pointing
    to something valid). ..

    Intuitively one would think the following would suffice, not sure yet what the standard states (will need to check):
    ..
    if (mod(i,2) == 1) then
    a%ip => i
    else
    print *,a[i-1]%ip
    end if
    sync all
    ..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rbader@21:1/5 to FortranFan on Fri Nov 19 22:57:56 2021
    FortranFan schrieb am Samstag, 20. November 2021 um 01:27:26 UTC+1:
    On Friday, November 19, 2021 at 6:39:59 PM UTC-5, Thomas Koenig wrote:

    ..
    (the sync all before the end of the subroutine to make
    sure that foo is still executing and a%ip is still pointing
    to something valid). ..

    Intuitively one would think the following would suffice, not sure yet what the standard states (will need to check):
    ..
    if (mod(i,2) == 1) then
    a%ip => i
    else
    print *,a[i-1]%ip
    end if
    sync all
    ..
    This variant is invalid due to a race condition.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to FortranFan on Sun Nov 21 14:13:38 2021
    FortranFan <parekhvs@gmail.com> schrieb:
    On Friday, November 19, 2021 at 12:10:14 PM UTC-5, Thomas Koenig wrote:

    ..
    The reason why I was asking is a design issue that we are currently
    facing desiging the shared memory coarray implementation for gfortran. ..

    To @Thomas Koenig and all the GCC/gfortran volunteers involved with this,

    Will it be possible for you to announce this effort toward shared memory coarray implementation at the Fortran Discourse site?
    https://fortran-lang.discourse.group/

    I will not post in a forum which prescribes "welcoming language".
    I feel unwelcome there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jos Bergervoet@21:1/5 to Thomas Koenig on Sun Nov 21 15:47:19 2021
    On 21/11/21 3:13 PM, Thomas Koenig wrote:
    FortranFan <parekhvs@gmail.com> schrieb:
    On Friday, November 19, 2021 at 12:10:14 PM UTC-5, Thomas Koenig wrote:

    ..
    The reason why I was asking is a design issue that we are currently
    facing desiging the shared memory coarray implementation for gfortran. .. >>
    To @Thomas Koenig and all the GCC/gfortran volunteers involved with this,

    Will it be possible for you to announce this effort toward shared memory coarray implementation at the Fortran Discourse site?
    https://fortran-lang.discourse.group/

    I will not post in a forum which prescribes "welcoming language".
    I feel unwelcome there.

    But Thomas, in a Fortran "discourse.group" they expect you to disagree,
    that's the whole point! So you are particularly welcome now.
    <https://youtu.be/xpAvcGcEc0k?t=82>

    --
    Jos

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)