• Direct access files

    From Beliavsky@21:1/5 to All on Sat Mar 12 18:37:02 2022
    I wrote a small program https://github.com/Beliavsky/FortranTip/blob/main/xxdirect_access.f90 to understand how direct access files, also copied below. Intel Fortran gives output

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    could not read record 10
    He Helium 4.0020

    which is what I expected, but gfortran gives

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    0.0000
    He Helium 4.0020

    Three questions:
    (1) Are both compilers standard-conforming here?
    (2) How should I set the parameter atom_recl?
    (3) Is there a way to find out the maximum record number of a direct access file other than trial and error?

    Here is the code from GitHub.

    module chemistry_mod
    implicit none
    integer, parameter :: atom_recl = 100
    type :: atom_t
    character :: symbol*2, name*10
    real :: mass
    end type atom_t
    end module chemistry_mod
    !
    program direct_access
    use chemistry_mod, only: atom_t, atom_recl
    implicit none
    integer, parameter :: outu = 10, inu = 11, irec(4) = [4,1,10,2]
    character (len=*), parameter :: data_file = "atomic_mass.dat"
    integer :: i,j,ierr
    type(atom_t) :: atom
    ! write to a direct access data file
    open (unit=outu,file=data_file,access="direct",recl=atom_recl,action="write") write (outu,rec=1) atom_t("H" ,"Hydrogen" ,1.007)
    write (outu,rec=2) atom_t("He","Helium" ,4.002)
    write (outu,rec=3) atom_t("Li","Lithium" ,6.941)
    write (outu,rec=4) atom_t("Be","Beryllium",9.012)
    close (outu)
    ! open the direct access data file for reading -- must specify recl
    open (unit=inu,file=data_file,access="direct",recl=atom_recl,action="read")
    do i=1,size(irec)
    j = irec(i) ! which record to read
    read (inu,rec=j,iostat=ierr) atom ! try to read record j from data file
    if (ierr == 0) then
    write (*,"(a2,1x,a10,1x,f8.4)") atom ! if it exists, print it
    else
    write (*,"('could not read record ',i0)") j
    end if
    end do
    end program direct_access

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John@21:1/5 to All on Sat Mar 12 20:14:46 2022
    Not quite sure if I know what you mean by determining the RECL length, but making a few assumptions the technically correct way is probably the following INQUIRE feature:


    INQUIRE BY OUTPUT LIST
    The scalar_int_variable in the IOLENGTH= specifier is assigned the
    processor-dependent number of file storage units that would be required to
    store the data of the output list in an unformatted file. The value shall be
    suitable as a RECL= specifier in an OPEN statement that connects a file for
    unformatted direct access when there are input/output statements with the
    same input/output list.

    The output list in an INQUIRE statement shall not contain any derived-type
    list items that require a defined input/output procedure as described in
    subclause 9.6.3. If a derived-type list item appears in the output list, the
    value returned for the IOLENGTH= specifier assumes that no defined
    input/output procedure will be invoked.

    Although in practice I do not know of a remaining Unix or GNU/Linux implementation where you cannot figure it out by counting the bytes required for your longest record. Theoretically you should double check the units used by RECL as they are not always
    in bytes and there are even compiler switches to change the unit, and see STORAGE_SIZE.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John@21:1/5 to All on Sat Mar 12 20:01:20 2022
    Direct access records are all the same length, so you can INQUIRE the file size and divide by the size of a line to get the number of lines.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Sat Mar 12 21:14:37 2022
    OK, you are supposed to use the INQUIRE statement with IOLENGTH to figure
    out the appropriate length based on the i/o-list. If you write the same type of data to each record, then it should be easy. Otherwise, find the IOLENGTH of each, and find the maximum.

    As far as I know, the system isn't required to tell you if you read past the end.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Beliavsky on Sat Mar 12 20:45:53 2022
    On Saturday, March 12, 2022 at 6:37:06 PM UTC-8, Beliavsky wrote:
    I wrote a small program https://github.com/Beliavsky/FortranTip/blob/main/xxdirect_access.f90 to understand how direct access files, also copied below. Intel Fortran gives output

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    could not read record 10
    He Helium 4.0020

    which is what I expected, but gfortran gives

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    0.0000
    He Helium 4.0020

    Three questions:
    (1) Are both compilers standard-conforming here?

    I suspect so. If the file system supplies zero bytes on such access,
    and possibly extends the file, then Fortran has to live with it.

    (2) How should I set the parameter atom_recl?

    Note that you are making many assumptions that the standard doesn't
    make. Among others, that disk files are read/written in bytes.
    (Maybe you forgot about machines with 36 bit words, but the standard didn't.)

    (3) Is there a way to find out the maximum record number of a direct access file other than trial and error?

    There might not be, and as well as I know, not all systems will tell you.

    I first knew direct access files with OS/360. OS/360 does not have a byte oriented file system, but a record oriented one. When you first open a
    direct access file, it writes all the records, with disk blocks block size equal to the record length. (Yes, physical disk blocks of that length.)
    To do that, it needs to know the number of records and record length.

    Note also the assumption that your atom_t is the same on
    all systems, including endianness, byte length of each member, and such.

    Otherwise, you can do C_SIZEOF() to find the bytes used, which the
    standard probably doesn't guarantee will agree, but probably does on
    byte oriented file systems.

    The usual implementation of Unix-like and Windows-like system is
    a series of blocks, and nothing else. The block length is not written
    to the file, but you are expected to remember it.

    Some programs write records in the beginning with important things,
    like the number of records in the file. That is up to you, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JCampbell@21:1/5 to Beliavsky on Sat Mar 12 21:39:32 2022
    On Sunday, March 13, 2022 at 1:37:06 PM UTC+11, Beliavsky wrote:
    I wrote a small program https://github.com/Beliavsky/FortranTip/blob/main/xxdirect_access.f90 to understand how direct access files, also copied below. Intel Fortran gives output

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    could not read record 10
    He Helium 4.0020

    which is what I expected, but gfortran gives

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    0.0000
    He Helium 4.0020

    Three questions:
    (1) Are both compilers standard-conforming here?
    (2) How should I set the parameter atom_recl?
    (3) Is there a way to find out the maximum record number of a direct access file other than trial and error?

    Here is the code from GitHub.

    module chemistry_mod
    implicit none
    integer, parameter :: atom_recl = 100
    type :: atom_t
    character :: symbol*2, name*10
    real :: mass
    end type atom_t
    end module chemistry_mod
    !
    program direct_access
    use chemistry_mod, only: atom_t, atom_recl
    implicit none
    integer, parameter :: outu = 10, inu = 11, irec(4) = [4,1,10,2]
    character (len=*), parameter :: data_file = "atomic_mass.dat"
    integer :: i,j,ierr
    type(atom_t) :: atom
    ! write to a direct access data file
    open (unit=outu,file=data_file,access="direct",recl=atom_recl,action="write") write (outu,rec=1) atom_t("H" ,"Hydrogen" ,1.007)
    write (outu,rec=2) atom_t("He","Helium" ,4.002)
    write (outu,rec=3) atom_t("Li","Lithium" ,6.941)
    write (outu,rec=4) atom_t("Be","Beryllium",9.012)
    close (outu)
    ! open the direct access data file for reading -- must specify recl
    open (unit=inu,file=data_file,access="direct",recl=atom_recl,action="read") do i=1,size(irec)
    j = irec(i) ! which record to read
    read (inu,rec=j,iostat=ierr) atom ! try to read record j from data file
    if (ierr == 0) then
    write (*,"(a2,1x,a10,1x,f8.4)") atom ! if it exists, print it
    else
    write (*,"('could not read record ',i0)") j
    end if
    end do
    end program direct_access

    I have a conservative view of this, which may have been superceeded by F08 or F18.
    I do not expect that for inquire; "recl=" is an attribute of a file, but is an attribute of an opened file.
    (I am not aware of any extra information stored with the direct access file to specify record size.)
    Basically you must know the record size before you open it. INQUIRE will return the record length that was defined in the OPEN.
    With some compilers, if you specify recl= for an existing direct access file, then the file size must be a multiple of the record length specified. (no trailing part record is allowed)

    I am also not sure of what is required for an unformatted read/write of a derrived type, such as "read (inu,rec=j,iostat=ierr) atom".
    Do you need to define a contained procedure for unformatted I/O ?
    You can list the components of the derived type, should a contained read procedure be required.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin Vowels@21:1/5 to John on Sat Mar 12 23:41:26 2022
    On Sunday, March 13, 2022 at 3:01:22 PM UTC+11, John wrote:
    Direct access records are all the same length, so you can INQUIRE the file size and divide by the size of a line to get the number of lines.
    .
    Not quite. There is typically a one-word initial record (one byte in the case of RECL=1).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Beliavsky on Sun Mar 13 10:03:38 2022
    Beliavsky <beliavsky@aol.com> schrieb:
    I wrote a small program https://github.com/Beliavsky/FortranTip/blob/main/xxdirect_access.f90 to understand how direct access files, also copied below. Intel Fortran gives output

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    could not read record 10
    He Helium 4.0020

    which is what I expected, but gfortran gives

    Be Beryllium 9.0120
    H Hydrogen 1.0070
    0.0000
    He Helium 4.0020

    Three questions:
    (1) Are both compilers standard-conforming here?

    As far as I have been able to determine, the standard makes no
    requirement of what should happen when a non-existing record is
    read, so the answer is probably yes.

    (2) How should I set the parameter atom_recl?

    You should use

    inquire(iolength=atom_recl) atom

    To quote the standard:

    # Every value in a stream file or an unformatted record file shall
    # occupy an integer number of file storage units; if the stream or
    # record file is unformatted, this number shall be the same for all
    # scalar values of the same type and type parameters. The number of
    # file storage units required for an item of a given type and type
    # parameters can be determined using the IOLENGTH= specifier of the
    # INQUIRE statement (12.10.3).

    (3) Is there a way to find out the maximum record number of a
    direct access file other than trial and error?

    You can inquire using SIZE.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John@21:1/5 to All on Sun Mar 13 08:07:05 2022
    Good reminder, although I was considering that would be handled as well by the inquire for IOLENGTH; I have seen a good number of
    files 4x bigger than they need to be because it was assumed RECL units were always bytes. It can actually go unnoticed because you do not have to read and write whole records, as the example shows (it is likely to be writing 16 byte records but using a
    RECL of 100 is likely to be writing 100 bytes, and as pointed out could be writing 400. I have seen that accidentally done with very large files, so the storage space waste can be
    significant. In several cases it had been happening for a very long time and was not noticed until someone tried reading the files with a different language.


    Direct access files are commonly used to provide random access to scratch data when insufficient memory is available or to store very large amounts of data in a format where they can be efficiently accessed when only specific sections of a file need
    accessed. There have been platforms such as Cray and VMS platforms that provided system support for efficiently accessing files with fixed-length records.

    It was also often used (in a platform-specific manner) to provide stream I/O before Fortran supported it. If your RECL unit is bytes and you open your file with RECL=1, on many platforms you could do stream I/O and random access at a byte level using
    direct access files at the cost of not being guaranteed portability; but it was commonly done.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve Lionel@21:1/5 to Beliavsky on Sun Mar 13 10:35:13 2022
    On 3/12/2022 9:37 PM, Beliavsky wrote:
    Three questions:
    (1) Are both compilers standard-conforming here?
    (2) How should I set the parameter atom_recl?
    (3) Is there a way to find out the maximum record number of a direct access file other than trial and error?

    I want to comment on something I haven't seen addressed in the replies
    so far. By default, RECL= for unformatted files in Intel Fortran is in
    4-byte units, for compatibility with the DEC heritage. Your example
    would benefit from using -standard-semantics, which includes -assume
    byterecl, if you want the RECL to be in units of bytes (unspecified in standards prior to F2003.)

    If you use this option, then Intel Fortran provides the same results as
    you show for gfortran:

    D:\Projects>ifort /standard-semantics t.f90
    Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running
    on Intel(R) 64, Version 2021.5.0 Build 20211109_000000
    Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

    Microsoft (R) Incremental Linker Version 14.29.30139.0
    Copyright (C) Microsoft Corporation. All rights reserved.

    -out:t.exe
    -subsystem:console
    t.obj

    D:\Projects>t.exe
    Be Beryllium 9.0120
    H Hydrogen 1.0070
    0.0000
    He Helium 4.0020

    Regarding number of records of a direct access file, a lot depends on
    the underlying file system, as discussed earlier. You can get an upper
    bound by asking for the file size with INQUIRE, and dividing by the
    record length you specify. Note that on the platforms you're likely to
    use, record length is not a file property so you can't inquire about it
    by file. (OpenVMS is one exception to this.)

    --
    Steve Lionel
    ISO/IEC JTC1/SC22/WG5 (Fortran) Convenor
    Retired Intel Fortran developer/support
    Email: firstname at firstnamelastname dot com
    Twitter: @DoctorFortran
    LinkedIn: https://www.linkedin.com/in/stevelionel
    Blog: https://stevelionel.com/drfortran
    WG5: https://wg5-fortran.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gary Scott@21:1/5 to Beliavsky on Sun Mar 13 11:09:15 2022
    On 3/12/2022 8:37 PM, Beliavsky wrote:
    I wrote a small program https://github.com/Beliavsky/FortranTip/blob/main/xxdirect_access.f90 to understand how direct access files, also copied below. Intel Fortran gives output

    snip


    Three questions:
    (1) Are both compilers standard-conforming here?
    (2) How should I set the parameter atom_recl?
    (3) Is there a way to find out the maximum record number of a direct access file other than trial and error?

    I've used direct access quite extensively for my database applications.
    I always created a "header" record at record 1 that described the file characteristics. With direct access, you can update this record at any
    time without affecting the rest of the file content. I read record 1 in
    order to determine for example allocation sizes or even the record
    lengths (by assuming some reasonable minimum to accommodate record 1
    content). Likewise, each record had prefixes for example, record
    "numbers", delete flag (records not actually removed, just marked for
    some possible later compression process or for "undelete" process).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John@21:1/5 to All on Sun Mar 13 08:34:36 2022
    I would also suggest that the example use IOMSG= and print the returned message now that that is widely available. It will (hopefully) produce a clear message that an attempt was made to read a non-existent record. Another thing not mentioned is you can
    have your
    code check the size of a unit:

    The number of bits in a file storage unit is given by the constant FILE_STORAGE_SIZE (16.10.2.11) defined
    in the intrinsic module ISO_FORTRAN_ENV. It is recommended that the file storage unit be an 8-bit octet
    where this choice is practical.

    so you can, in the case of ifort(1), warn if it is compiled with a FILE_STORAGE_SIZE of 32 instead of 8. I thought ifort(1) no longer used longwords as the unit for RECL by default, too. I guess that is not the case unless formatted I/O is used, now that
    I go and look. My own build scripts and fpm(1) specify it so I had forgotten.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Beliavsky@21:1/5 to All on Thu Mar 17 06:38:03 2022
    I wrote a program where each record consists of 2 reals and timed writing and reading large numbers of such records using unformatted direct, unformatted stream, and formatted sequential. For this case unformatted stream looks good since both writing and
    reading are fast. On Windows, Intel Fortran was much slower than gfortran for writing unformatted direct. The code is below and also at https://github.com/Beliavsky/FortranTip/blob/main/xdirect_access_array.f90 . The gfortran results with -O3 are

    n, nreals = 10000000 2
    iol= 8
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622

    task time
    write unformatted direct 1.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 14.250000
    read formatted sequential 8.015625

    and the Intel Fortran results for Version 2021.5.0 Build 20211109_000000 with -O3 are

    n, nreals = 10000000 2
    iol= 2
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113

    task time
    write unformatted direct 30.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 30.859375
    read formatted sequential 5.046875

    Here is the code.

    program direct_access
    implicit none
    integer, parameter :: n = 10**7, & ! # of records
    nreals = 2, & ! values per record
    iu = 10, ntimes = 7, ndt = ntimes - 1, nlen = 35
    character (len=*), parameter :: unformatted_file = "temp.bin", &
    unformatted_seq_file = "temp_seq.bin",formatted_seq_file = "temp_seq.txt" integer :: i,iol
    real :: xmat(n,nreals),ymat(n,nreals),xlast(nreals),times(ntimes),dt(ndt) character (len=nlen) :: labels(ndt)
    call random_number(xmat)
    inquire (iolength=iol) xmat(1,:) ! store record length in iol
    print*,"n, nreals =",n,nreals
    print*,"iol=",iol
    call cpu_time(times(1))
    ! write unformatted direct
    open (unit=iu,file=unformatted_file,access="direct", &
    recl=iol,action="write")
    do i=1,n
    write (iu,rec=i) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(2))
    open (unit=iu,file=unformatted_file,access="direct", &
    recl=iol,form="unformatted",action="read")
    ! read the last record without looping over previous records
    read (iu,rec=n) xlast
    print*,xmat(n,:),xlast
    close (iu)
    call cpu_time(times(3))
    ! write unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", &
    access="stream",action="write")
    write (iu) xmat
    close (iu)
    call cpu_time(times(4))
    ! read unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", &
    access="stream",action="read")
    read (iu) ymat
    print*,xmat(n,:),ymat(n,:)
    close (iu)
    call cpu_time(times(5))
    ! write formatted sequential
    open (unit=iu,file=formatted_seq_file,action="write")
    do i=1,n
    write (iu,*) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(6))
    ! read formatted sequential
    open (unit=iu,file=formatted_seq_file,action="read")
    do i=1,n
    read (iu,*) ymat(i,:)
    end do
    print*,xmat(n,:),ymat(n,:)
    call cpu_time(times(7))
    dt = times(2:) - times(:ntimes-1)
    labels = [character (len=nlen) :: &
    "write unformatted direct","read unformatted direct", &
    "write unformatted stream","read unformatted stream", &
    "write formatted sequential","read formatted sequential"]
    print "(/,a35,1x,a9)", "task","time"
    print "(a35,1x,f9.6)",(trim(labels(i)),dt(i),i=1,ndt)
    end program direct_access

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve Lionel@21:1/5 to Beliavsky on Thu Mar 17 10:15:46 2022
    On 3/17/2022 9:38 AM, Beliavsky wrote:
    I wrote a program where each record consists of 2 reals and timed writing and reading large numbers of such records using unformatted direct, unformatted stream, and formatted sequential. For this case unformatted stream looks good since both writing
    and reading are fast. On Windows, Intel Fortran was much slower than gfortran for writing unformatted direct.

    Were you aware that the Intel-compiled program is writing and reading
    four times as much data as the gfortran version? The default in Intel
    Fortran, for compatibility with its DEC heritage and because, prior to
    F2003, the standard was ambiguous about what RECL= units were, uses
    numeric storage units (4 bytes) as the unit for RECL in unformatted
    files. Use -standard-semantics if you want byte units (and other
    semantic changes consistent with the current standard.)

    --
    Steve Lionel
    ISO/IEC JTC1/SC22/WG5 (Fortran) Convenor
    Retired Intel Fortran developer/support
    Email: firstname at firstnamelastname dot com
    Twitter: @DoctorFortran
    LinkedIn: https://www.linkedin.com/in/stevelionel
    Blog: https://stevelionel.com/drfortran
    WG5: https://wg5-fortran.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Beliavsky@21:1/5 to All on Thu Mar 17 08:43:01 2022
    Reading the Stream Input Output article by Clive Page at the Fortran Wiki https://fortranwiki.org/fortran/show/Stream+Input+Output, it looks like one can get the benefits of direct access files using unformatted stream I/O and setting the POS in the read
    or write statements to access the desired array element. Dividing the storage_size of the derived type by the file_storage_size constant from the iso_fortran_env module gives the position offset of successive array elements in the file. The code is at
    https://github.com/Beliavsky/FortranTip/blob/main/stream_pos_dt.f90 and also below.

    module derived_type_mod
    integer, parameter :: wp = kind(1.0)
    type :: rec
    integer :: i
    real(kind=wp) :: r(2)
    end type rec
    end module derived_type_mod
    !
    program stream_pos
    use iso_fortran_env , only: file_storage_size
    use derived_type_mod, only: wp, rec
    implicit none
    integer, parameter :: iu = 10, n = 10**7
    type(rec) :: x(n),x1,x2,xn
    integer :: i,pos_x, pos_n,size_x
    character (len=*), parameter :: data_file = "temp.bin", &
    fmt_g = "(*(g0.3,:,1x))"
    size_x=storage_size(x)
    ! size of an element of x(:) in bits
    pos_x = size_x/file_storage_size
    ! increment of position to get next element
    call random_number(x%r(1))
    call random_number(x%r(2))
    forall (i=1:n) x(i)%i = 10*i
    open (unit=iu,file=data_file,action="readwrite", &
    access="stream",form="unformatted")
    print fmt_g,"storage_size_i =",storage_size(x%i)
    print fmt_g,"storage_size_ir =",storage_size(x(1)%r)
    print fmt_g,"storage_size_type =",size_x
    print fmt_g,"file_storage_size =",file_storage_size
    print fmt_g,"position increment =",pos_x
    write (iu) x
    read (iu,pos=1) x1 ! read first element
    print fmt_g,"1st",x(1),x1
    ! read 2nd element at position offset pos_x from 1st
    read (iu,pos=1+pos_x) x2
    print fmt_g,"2nd",x(2),x2
    ! read nth element
    pos_n = 1 + (n-1)*pos_x
    read (iu,pos=pos_n) xn
    print fmt_g,"nth",x(n),xn
    ! multiply the nth value in file by 10
    x(n)%r = 10*x(n)%r
    write (iu,pos=pos_n) x(n)
    read (iu,pos=pos_n) xn ! read new nth value
    print fmt_g,"10*nth",xn
    end program stream_pos
    ! sample gfortran output:
    ! storage_size_i = 32
    ! storage_size_ir = 32
    ! storage_size_type = 96
    ! file_storage_size = 8
    ! position increment = 12
    ! 1st 10 0.779 0.367 10 0.779 0.367
    ! 2nd 20 0.220 0.590 20 0.220 0.590
    ! nth 100000000 0.122 0.671 100000000 0.122 0.671
    ! 10*nth 100000000 1.22 6.71

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Beliavsky@21:1/5 to All on Thu Mar 17 12:09:08 2022
    On Thursday, March 17, 2022 at 2:36:57 PM UTC-4, Phillip Helbig (undress to reply) wrote:
    In article <3c9ee6bc-13d3-412a...@googlegroups.com>,
    Beliavsky <beli...@aol.com> writes:

    Reading the Stream Input Output article by Clive Page at the Fortran Wiki h=
    ttps://fortranwiki.org/fortran/show/Stream+Input+Output, it looks like one =
    can get the benefits of direct access files using unformatted stream I/O an=
    d setting the POS in the read or write statements to access the desired arr=
    ay element. Dividing the storage_size of the derived type by the file_stora=
    ge_size constant from the iso_fortran_env module gives the position offset =
    of successive array elements in the file. The code is at https://github.com=
    /Beliavsky/FortranTip/blob/main/stream_pos_dt.f90 and also below.

    I definitely prefer record-oriented format to stream format for usenet
    posts. :-|

    I will use shorter lines in future messages, cutting them off at position 70 or 75.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Helbig (undress to reply@21:1/5 to Beliavsky on Thu Mar 17 18:36:50 2022
    In article <3c9ee6bc-13d3-412a-9749-7224b4492316n@googlegroups.com>,
    Beliavsky <beliavsky@aol.com> writes:

    Reading the Stream Input Output article by Clive Page at the Fortran Wiki h= ttps://fortranwiki.org/fortran/show/Stream+Input+Output, it looks like one = can get the benefits of direct access files using unformatted stream I/O an= d setting the POS in the read or write statements to access the desired arr= ay element. Dividing the storage_size of the derived type by the file_stora= ge_size constant from the iso_fortran_env module gives the position offset = of successive array elements in the file. The code is at https://github.com= /Beliavsky/FortranTip/blob/main/stream_pos_dt.f90 and also below.

    I definitely prefer record-oriented format to stream format for usenet
    posts. :-|

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Helbig (undress to reply@21:1/5 to Beliavsky on Thu Mar 17 19:34:09 2022
    In article <b8a9118b-fa16-4bff-a488-6053550d9ec7n@googlegroups.com>,
    Beliavsky <beliavsky@aol.com> writes:

    On Thursday, March 17, 2022 at 2:36:57 PM UTC-4, Phillip Helbig (undress to reply) wrote:
    In article <3c9ee6bc-13d3-412a...@googlegroups.com>,
    Beliavsky <beli...@aol.com> writes:

    Reading the Stream Input Output article by Clive Page at the Fortran Wiki h=
    ttps://fortranwiki.org/fortran/show/Stream+Input+Output, it looks like one =
    can get the benefits of direct access files using unformatted stream I/O an=
    d setting the POS in the read or write statements to access the desired arr=
    ay element. Dividing the storage_size of the derived type by the file_stora=
    ge_size constant from the iso_fortran_env module gives the position offset =
    of successive array elements in the file. The code is at https://github.com=
    /Beliavsky/FortranTip/blob/main/stream_pos_dt.f90 and also below.

    I definitely prefer record-oriented format to stream format for usenet posts. :-|

    I will use shorter lines in future messages, cutting them off at position 70 or 75.

    :-)

    Thanks!

    Many people assume that the way the text is shown on their own screen is
    the same as other people will see it, especially with regard to
    line-breaking, but also with regard to encoding and decoding of 8-bit characters and so on.

    When I started using Fortran, I set the editor to automatically wrap
    after 72 characters. That is a good number for usenet and for many
    other text documents as well.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Thu Mar 17 16:26:47 2022
    On Thursday, March 17, 2022 at 12:34:13 PM UTC-7, Phillip Helbig (undress to reply) wrote:

    (snip)

    Many people assume that the way the text is shown on their own screen is
    the same as other people will see it, especially with regard to line-breaking, but also with regard to encoding and decoding of 8-bit characters and so on.

    When I started using Fortran, I set the editor to automatically wrap
    after 72 characters. That is a good number for usenet and for many
    other text documents as well.

    Some news hosts require it. I forget now which one I used to use,
    but it required reformatting quoted text that was too long. I still
    do that sometimes, remembering that one. With variable width fonts,
    it is hard to know how wide things are, but I do try to keep them
    not too wide.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to gah4@u.washington.edu on Fri Mar 18 07:03:39 2022
    gah4 <gah4@u.washington.edu> schrieb:
    On Thursday, March 17, 2022 at 12:34:13 PM UTC-7, Phillip Helbig (undress to reply) wrote:

    (snip)

    Many people assume that the way the text is shown on their own screen is
    the same as other people will see it, especially with regard to
    line-breaking, but also with regard to encoding and decoding of 8-bit
    characters and so on.

    When I started using Fortran, I set the editor to automatically wrap
    after 72 characters. That is a good number for usenet and for many
    other text documents as well.

    Some news hosts require it. I forget now which one I used to use,
    but it required reformatting quoted text that was too long. I still
    do that sometimes, remembering that one. With variable width fonts,
    it is hard to know how wide things are, but I do try to keep them
    not too wide.

    There's something to be said for using an old-style news reader
    in a terminal window, and for having an editor which allows line
    breaks, or which allows running the text through external commands.

    I usually run "fmt -69" on paragraphs before sending out an article.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JCampbell@21:1/5 to Beliavsky on Sat Mar 19 03:57:19 2022
    On Friday, March 18, 2022 at 12:38:06 AM UTC+11, Beliavsky wrote:
    I wrote a program where each record consists of 2 reals and timed writing and reading large numbers of such records using unformatted direct, unformatted stream, and formatted sequential. For this case unformatted stream looks good since both writing
    and reading are fast. On Windows, Intel Fortran was much slower than gfortran for writing unformatted direct. The code is below and also at https://github.com/Beliavsky/FortranTip/blob/main/xdirect_access_array.f90 . The gfortran results with -O3 are

    n, nreals = 10000000 2
    iol= 8
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622

    task time
    write unformatted direct 1.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 14.250000
    read formatted sequential 8.015625

    and the Intel Fortran results for Version 2021.5.0 Build 20211109_000000 with -O3 are

    n, nreals = 10000000 2
    iol= 2
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113

    task time
    write unformatted direct 30.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 30.859375
    read formatted sequential 5.046875

    Here is the code.

    program direct_access
    implicit none
    integer, parameter :: n = 10**7, & ! # of records
    nreals = 2, & ! values per record
    iu = 10, ntimes = 7, ndt = ntimes - 1, nlen = 35
    character (len=*), parameter :: unformatted_file = "temp.bin", & unformatted_seq_file = "temp_seq.bin",formatted_seq_file = "temp_seq.txt" integer :: i,iol
    real :: xmat(n,nreals),ymat(n,nreals),xlast(nreals),times(ntimes),dt(ndt) character (len=nlen) :: labels(ndt)
    call random_number(xmat)
    inquire (iolength=iol) xmat(1,:) ! store record length in iol
    print*,"n, nreals =",n,nreals
    print*,"iol=",iol
    call cpu_time(times(1))
    ! write unformatted direct
    open (unit=iu,file=unformatted_file,access="direct", & recl=iol,action="write")
    do i=1,n
    write (iu,rec=i) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(2))
    open (unit=iu,file=unformatted_file,access="direct", & recl=iol,form="unformatted",action="read")
    ! read the last record without looping over previous records
    read (iu,rec=n) xlast
    print*,xmat(n,:),xlast
    close (iu)
    call cpu_time(times(3))
    ! write unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", & access="stream",action="write")
    write (iu) xmat
    close (iu)
    call cpu_time(times(4))
    ! read unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", & access="stream",action="read")
    read (iu) ymat
    print*,xmat(n,:),ymat(n,:)
    close (iu)
    call cpu_time(times(5))
    ! write formatted sequential
    open (unit=iu,file=formatted_seq_file,action="write")
    do i=1,n
    write (iu,*) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(6))
    ! read formatted sequential
    open (unit=iu,file=formatted_seq_file,action="read")
    do i=1,n
    read (iu,*) ymat(i,:)
    end do
    print*,xmat(n,:),ymat(n,:)
    call cpu_time(times(7))
    dt = times(2:) - times(:ntimes-1)
    labels = [character (len=nlen) :: &
    "write unformatted direct","read unformatted direct", &
    "write unformatted stream","read unformatted stream", &
    "write formatted sequential","read formatted sequential"]
    print "(/,a35,1x,a9)", "task","time"
    print "(a35,1x,f9.6)",(trim(labels(i)),dt(i),i=1,ndt)
    end program direct_access
    My preference is for "elapse_time" rather than CPU_TIME, especially for I/O timing, although elapsed time and I/O performance can be effected by disk buffering.
    There are arguments for either timer, although they do represent different performance. The granularity of CPU_TIME is always a problem on Windows.
    Given how annoying "recl=iol" is when converting to ifort, I would not fix that although it should be updated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gary Scott@21:1/5 to JCampbell on Sat Mar 19 11:05:25 2022
    On 3/19/2022 5:57 AM, JCampbell wrote:
    On Friday, March 18, 2022 at 12:38:06 AM UTC+11, Beliavsky wrote:
    I wrote a program where each record consists of 2 reals and timed writing and reading large numbers of such records using unformatted direct, unformatted stream, and formatted sequential. For this case unformatted stream looks good since both writing
    and reading are fast. On Windows, Intel Fortran was much slower than gfortran for writing unformatted direct. The code is below and also at https://github.com/Beliavsky/FortranTip/blob/main/xdirect_access_array.f90 . The gfortran results with -O3 are

    n, nreals = 10000000 2
    iol= 8
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622

    task time
    write unformatted direct 1.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 14.250000
    read formatted sequential 8.015625

    and the Intel Fortran results for Version 2021.5.0 Build 20211109_000000 with -O3 are

    n, nreals = 10000000 2
    iol= 2
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113

    task time
    write unformatted direct 30.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 30.859375
    read formatted sequential 5.046875

    Here is the code.

    program direct_access
    implicit none
    integer, parameter :: n = 10**7, & ! # of records
    nreals = 2, & ! values per record
    iu = 10, ntimes = 7, ndt = ntimes - 1, nlen = 35
    character (len=*), parameter :: unformatted_file = "temp.bin", &
    unformatted_seq_file = "temp_seq.bin",formatted_seq_file = "temp_seq.txt"
    integer :: i,iol
    real :: xmat(n,nreals),ymat(n,nreals),xlast(nreals),times(ntimes),dt(ndt)
    character (len=nlen) :: labels(ndt)
    call random_number(xmat)
    inquire (iolength=iol) xmat(1,:) ! store record length in iol
    print*,"n, nreals =",n,nreals
    print*,"iol=",iol
    call cpu_time(times(1))
    ! write unformatted direct
    open (unit=iu,file=unformatted_file,access="direct", &
    recl=iol,action="write")
    do i=1,n
    write (iu,rec=i) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(2))
    open (unit=iu,file=unformatted_file,access="direct", &
    recl=iol,form="unformatted",action="read")
    ! read the last record without looping over previous records
    read (iu,rec=n) xlast
    print*,xmat(n,:),xlast
    close (iu)
    call cpu_time(times(3))
    ! write unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", &
    access="stream",action="write")
    write (iu) xmat
    close (iu)
    call cpu_time(times(4))
    ! read unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", &
    access="stream",action="read")
    read (iu) ymat
    print*,xmat(n,:),ymat(n,:)
    close (iu)
    call cpu_time(times(5))
    ! write formatted sequential
    open (unit=iu,file=formatted_seq_file,action="write")
    do i=1,n
    write (iu,*) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(6))
    ! read formatted sequential
    open (unit=iu,file=formatted_seq_file,action="read")
    do i=1,n
    read (iu,*) ymat(i,:)
    end do
    print*,xmat(n,:),ymat(n,:)
    call cpu_time(times(7))
    dt = times(2:) - times(:ntimes-1)
    labels = [character (len=nlen) :: &
    "write unformatted direct","read unformatted direct", &
    "write unformatted stream","read unformatted stream", &
    "write formatted sequential","read formatted sequential"]
    print "(/,a35,1x,a9)", "task","time"
    print "(a35,1x,f9.6)",(trim(labels(i)),dt(i),i=1,ndt)
    end program direct_access
    My preference is for "elapse_time" rather than CPU_TIME, especially for I/O timing, although elapsed time and I/O performance can be effected by disk buffering.
    There are arguments for either timer, although they do represent different performance. The granularity of CPU_TIME is always a problem on Windows.
    Given how annoying "recl=iol" is when converting to ifort, I would not fix that although it should be updated.

    I always use mkl_get_cpu_clocks and related procedures on windows. It
    is actually quite reliable, despite some claims to the contrary.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JCampbell@21:1/5 to Beliavsky on Sun Mar 20 02:40:37 2022
    On Friday, March 18, 2022 at 12:38:06 AM UTC+11, Beliavsky wrote:
    I wrote a program where each record consists of 2 reals and timed writing and reading large numbers of such records using unformatted direct, unformatted stream, and formatted sequential. For this case unformatted stream looks good since both writing
    and reading are fast. On Windows, Intel Fortran was much slower than gfortran for writing unformatted direct. The code is below and also at https://github.com/Beliavsky/FortranTip/blob/main/xdirect_access_array.f90 . The gfortran results with -O3 are

    n, nreals = 10000000 2
    iol= 8
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622
    7.01291561E-02 0.410597622 7.01291561E-02 0.410597622

    task time
    write unformatted direct 1.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 14.250000
    read formatted sequential 8.015625

    and the Intel Fortran results for Version 2021.5.0 Build 20211109_000000 with -O3 are

    n, nreals = 10000000 2
    iol= 2
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113
    0.4931603 0.8502113 0.4931603 0.8502113

    task time
    write unformatted direct 30.156250
    read unformatted direct 0.000000
    write unformatted stream 0.031250
    read unformatted stream 0.015625
    write formatted sequential 30.859375
    read formatted sequential 5.046875

    Here is the code.

    program direct_access
    implicit none
    integer, parameter :: n = 10**7, & ! # of records
    nreals = 2, & ! values per record
    iu = 10, ntimes = 7, ndt = ntimes - 1, nlen = 35
    character (len=*), parameter :: unformatted_file = "temp.bin", & unformatted_seq_file = "temp_seq.bin",formatted_seq_file = "temp_seq.txt" integer :: i,iol
    real :: xmat(n,nreals),ymat(n,nreals),xlast(nreals),times(ntimes),dt(ndt) character (len=nlen) :: labels(ndt)
    call random_number(xmat)
    inquire (iolength=iol) xmat(1,:) ! store record length in iol
    print*,"n, nreals =",n,nreals
    print*,"iol=",iol
    call cpu_time(times(1))
    ! write unformatted direct
    open (unit=iu,file=unformatted_file,access="direct", & recl=iol,action="write")
    do i=1,n
    write (iu,rec=i) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(2))
    open (unit=iu,file=unformatted_file,access="direct", & recl=iol,form="unformatted",action="read")
    ! read the last record without looping over previous records
    read (iu,rec=n) xlast
    print*,xmat(n,:),xlast
    close (iu)
    call cpu_time(times(3))
    ! write unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", & access="stream",action="write")
    write (iu) xmat
    close (iu)
    call cpu_time(times(4))
    ! read unformatted stream
    open (unit=iu,file=unformatted_seq_file,form="unformatted", & access="stream",action="read")
    read (iu) ymat
    print*,xmat(n,:),ymat(n,:)
    close (iu)
    call cpu_time(times(5))
    ! write formatted sequential
    open (unit=iu,file=formatted_seq_file,action="write")
    do i=1,n
    write (iu,*) xmat(i,:)
    end do
    close (iu)
    call cpu_time(times(6))
    ! read formatted sequential
    open (unit=iu,file=formatted_seq_file,action="read")
    do i=1,n
    read (iu,*) ymat(i,:)
    end do
    print*,xmat(n,:),ymat(n,:)
    call cpu_time(times(7))
    dt = times(2:) - times(:ntimes-1)
    labels = [character (len=nlen) :: &
    "write unformatted direct","read unformatted direct", &
    "write unformatted stream","read unformatted stream", &
    "write formatted sequential","read formatted sequential"]
    print "(/,a35,1x,a9)", "task","time"
    print "(a35,1x,f9.6)",(trim(labels(i)),dt(i),i=1,ndt)
    end program direct_access
    Another problem that this example demonstrates is the poor "write formatted sequential" performance of 14.25 sec for gfortran and 30.86 sec for Intel. This is not due to disk I/O delays.
    At gfortran Ver 4.9 (2015), I wrote my own Fortran routine for F15.9. It is still about 4 x faster that gfortran in my recent tests, even after gfortran has reported to improve the performance.
    It is an interesting issue for rerunning of old 1980's Finite Element benchmarks, which show 1000x compute performance, but much slower reporting performance.
    In many of these benchmark cases the reporting phase now takes longer than the calculation phase when repeating the old approach of reporting results to a text file.
    Text output is now seldomly used and I use access="direct" dumps that are portable between different Fortran compilers, but they still feature in other notable (pb11) benchmarks which have excessively influenced other fortran optimisation.
    I have included this test in the code for 20 million numbers, giving for gfortran on an i52300; 22.7 s for write (iu,*); 20.8 for write (iu,fmt=(2f15.9)'), but 5.1s for call write_F15_to_file ( iu, xmat(1,i), xmat(2,i) ).
    Formatted reads are also slow, but I don't use such large data transfers as text.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)