• Re: setting IFS to new line doesn't work while searching?

    From Greg Wooledge@21:1/5 to Albretch Mueller on Fri Dec 15 13:50:01 2023
    On Fri, Dec 15, 2023 at 12:33:01PM +0000, Albretch Mueller wrote:
    #fndar=($(IFS=$'\n'; find "$sdir" -type f -printf '%P|%TY-%Tm-%Td %TI:%TM|%s\n' | sort --version-sort --reverse))

    the array construct ($( ... )) is using the space (between the date
    and the time) also to split array elements,

    Yeah, no. That's not how it works.

    You're setting IFS *inside* the command substitution whose value is
    what you're trying to word-split. It needs to be set outside.

    In addition to word splitting, an unquoted command substitution's
    output is going to undergo filename expansion (globbing). So you
    would also need to disable that.

    More to the point, bash has a 'readarray' command which does what you *actually* want:

    readarray -t fndar < <(find "$sdir" ...)

    This avoids all of the issues with word splitting and globbing and setting/resetting the IFS variable, and is more efficient as well.

    BTW, readarray is a synonym for 'mapfile'. You may use either spelling.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Fri Dec 15 13:50:01 2023
    Albretch Mueller (12023-12-15):
    sdir="$(pwd)"
    #fndar=($(IFS=$'\n'; find "$sdir" -type f -printf '%P|%TY-%Tm-%Td %TI:%TM|%s\n' | sort --version-sort --reverse))
    #fndar=($(IFS='\n'; find "$sdir" -type f -printf '%P|%TY-%Tm-%Td %TI:%TM|%s\n' | sort --version-sort --reverse))
    fndar=($(find "$sdir" -type f -printf '%P|%TY-%Tm-%Td %TI:%TM|%s\n' |
    sort --version-sort --reverse))
    fndarl=${#fndar[@]}
    echo "// __ \$fndarl: |${fndarl}|${fndar[0]}"

    the array construct ($( ... )) is using the space (between the date
    and the time) also to split array elements, but file names and paths
    may contain spaces, so ($( ... )) should have a way to reset its
    parsing metadata, or, do you know of any other way to get each whole
    -printf ... line out of find as part of array elements?

    You set IFS in the subshell, but the subshell is doing nothing related
    to IFS, it is just calling find and sort. You need to set IFS on the
    shell that does the splitting.

    Also, note that file names can also contain newlines in general. The
    only robust delimiter is the NUL character.

    Also, ditch batch. For simple scripts, do standard shell. For complex
    scripts and interactive use, zsh rulz:

    fndar=(${(f)"$(...)"})
    fndar=(${(ps:\0:)"$(...)"})
    fndar=(**/*(O))

    (I do not think zsh can sort version numbers easily, though.)

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Nicolas George on Fri Dec 15 14:10:02 2023
    On Fri, Dec 15, 2023 at 01:42:14PM +0100, Nicolas George wrote:
    Also, note that file names can also contain newlines in general. The
    only robust delimiter is the NUL character.

    True. In order to be 100% safe, the OP's code would need to look
    more like this:

    readarray -d '' fndar < <(
    find "$sdir" ... -printf 'stuff\0' |
    sort -z --otherflags
    )

    The -d '' option for readarray requires bash 4.4 or higher. If this
    script needs to run on bash 4.3 or older, you'd need to use a loop
    instead of readarray.

    This may look a bit inscrutable, but the purpose is to ensure that
    a NUL delimiter is used at every step. First, find -printf '...\0'
    will print a NUL character after each filename-and-stuff. Second,
    sort -z uses NUL as its record separator (instead of newline), and
    produces sorted output that also uses NUL. Finally, readarray -d ''
    uses the NUL character as its record separator. The final result is
    an array containing each filename-and-stuff produced by find, in the
    order determined by sort, even if some of the filenames contain
    newline characters.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Fri Dec 15 14:40:01 2023
    Greg Wooledge (12023-12-15):
    On Fri, Dec 15, 2023 at 01:42:14PM +0100, Nicolas George wrote:
    Also, note that file names can also contain newlines in general. The
    only robust delimiter is the NUL character.

    True. In order to be 100% safe, the OP's code would need to look
    more like this:

    readarray -d '' fndar < <(
    find "$sdir" ... -printf 'stuff\0' |
    sort -z --otherflags
    )

    The -d '' option for readarray requires bash 4.4 or higher. If this
    script needs to run on bash 4.3 or older, you'd need to use a loop
    instead of readarray.

    This may look a bit inscrutable, but the purpose is to ensure that
    a NUL delimiter is used at every step. First, find -printf '...\0'
    will print a NUL character after each filename-and-stuff. Second,
    sort -z uses NUL as its record separator (instead of newline), and
    produces sorted output that also uses NUL. Finally, readarray -d ''
    uses the NUL character as its record separator. The final result is
    an array containing each filename-and-stuff produced by find, in the
    order determined by sort, even if some of the filenames contain
    newline characters.

    It is possible to do it safely in bash plus command-line tools, indeed.
    But in such a complex case, it is better to use something with a
    higher-level interface. I am sure File::Find and Version::Compare can
    let Perl do the same thing in a much safer way.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Nicolas George on Fri Dec 15 15:00:01 2023
    On Fri, Dec 15, 2023 at 02:30:21PM +0100, Nicolas George wrote:
    Greg Wooledge (12023-12-15):
    readarray -d '' fndar < <(
    find "$sdir" ... -printf 'stuff\0' |
    sort -z --otherflags
    )

    It is possible to do it safely in bash plus command-line tools, indeed.
    But in such a complex case, it is better to use something with a
    higher-level interface. I am sure File::Find and Version::Compare can
    let Perl do the same thing in a much safer way.

    Equally safe, perhaps. Not safer. I don't know those particular perl
    modules -- are they included in a standard Debian system, or does
    one need to install optional packages? And then there's a learning
    curve for them as well.

    By the way, your MUA is adding 10000 years to its datestamps.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Fri Dec 15 15:10:02 2023
    Greg Wooledge (12023-12-15):
    Equally safe, perhaps. Not safer. I don't know those particular perl modules -- are they included in a standard Debian system, or does
    one need to install optional packages? And then there's a learning
    curve for them as well.

    File::Find is a standard module, Version::Compare is packaged.

    I consider it safer because I factor mistakes in my estimate: if you get
    the Perl version working without using strange constructs in your code,
    the odds that it will break on special characters are vanishingly thin.
    With shell, unless we tested for it, there are chances we forgot a
    corner case.

    By the way, your MUA is adding 10000 years to its datestamps.

    It is called the Holocene calendar, the principle being that everything
    that happened that might deserve to be expressed as a year in the last
    12K years.

    See:

    https://en.wikipedia.org/wiki/Holocene_calendar

    Or possibly:

    https://www.youtube.com/watch?v=czgOWmtGVGs

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to Greg Wooledge on Sat Dec 16 06:50:01 2023
    On Fri 15 Dec 2023 at 08:58:10 (-0500), Greg Wooledge wrote:
    On Fri, Dec 15, 2023 at 02:30:21PM +0100, Nicolas George wrote:
    Greg Wooledge (12023-12-15):
    readarray -d '' fndar < <(
    find "$sdir" ... -printf 'stuff\0' |
    sort -z --otherflags
    )

    It is possible to do it safely in bash plus command-line tools, indeed.
    But in such a complex case, it is better to use something with a higher-level interface. I am sure File::Find and Version::Compare can
    let Perl do the same thing in a much safer way.

    Equally safe, perhaps. Not safer. I don't know those particular perl modules -- are they included in a standard Debian system, or does
    one need to install optional packages? And then there's a learning
    curve for them as well.

    By the way, your MUA is adding 10000 years to its datestamps.

    Don't knock it: beats using the French Republican calendar.
    But I miss the hours:minutes used by most MUAs (the minutes
    being relatively unaffected by time zones). They can help
    with following threads stored in different locations.

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)