• Re: awk not just using the Field separator as such. it is using the bla

    From David@21:1/5 to debianlist@potentially-spam.de-bruy on Wed Feb 15 10:00:01 2023
    On Wed, 15 Feb 2023 at 18:22, DdB
    <debianlist@potentially-spam.de-bruyn.de> wrote:
    Am 15.02.2023 um 07:25 schrieb Albretch Mueller:

    $ _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
    echo "// __ \$_L: |${_L}|"
    _AR=($(echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i}' )) _AR_L=${#_AR[@]}
    echo "// __ \$_AR_L: |${_AR_L}|"
    for(( _IX=0; _IX<${_AR_L}; _IX++ )); do
    echo "// __ [$_IX/$_AR_L): |${_AR[$_IX]}|"
    done

    what awk are you using? gnu awk works fine. see:

    The complaint has nothing to do with awk.

    The reason this is happening is because when the
    shell creates the elements of the array _AR, it
    parses those elements as separated by any whitespace.

    Whereas the OP expects the elements to be
    separated by newlines.

    Just looking at this made my eyes bleed so that,
    combined with the total lack of troubleshooting
    effort, means that my answer ends as follows:

    Start reading here:
    http://mywiki.wooledge.org/BashFAQ/005

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albretch Mueller@21:1/5 to DdB on Wed Feb 15 13:10:01 2023
    On 2/15/23, DdB <debianlist@potentially-spam.de-bruyn.de> wrote:
    $ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | awk
    -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
    Adams, Fred, and Ken Aizawa
    The Bounds of Cognition

    yes and this also works:

    _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
    echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
    Adams, Fred, and Ken Aizawa
    The Bounds of Cognition

    but I wasn't able to write the output into an array

    $ awk --version

    I also discovered that there seems to be something wrong with the
    version of awk I am working:

    $ awk --version
    awk: not an option: --version

    $ which awk
    /usr/bin/awk

    $ awk -W version
    mawk 1.3.4 20200120
    Copyright 2008-2019,2020, Thomas E. Dickey
    Copyright 1991-1996,2014, Michael D. Brennan

    random-funcs: srandom/random
    regex-funcs: internal
    compiled limits:
    sprintf buffer 8192
    maximum-integer 2147483647
    $

    On 2/15/23, David <bouncingcats@gmail.com> wrote:
    Start reading here:
    http://mywiki.wooledge.org/BashFAQ/005

    which helped me find a hack around it I am comfortable with:

    _DT=$(date +%Y%m%d%H%M%S)
    _TMPFL=$(basename "$(pwd)")_$(mktemp ${_DT}.XXXXXX)

    _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
    echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}' > "${_TMPFL}"

    mapfile -t _AR < "${_TMPFL}"
    _AR_L=${#_AR[@]}
    echo "// __ \$_AR_L: |${_AR_L}|"

    rm --force --verbose "${_TMPFL}"

    I think the problem is whatever bash is using as "awk" is also
    including a blank space as delimiter for the splitting of the string

    lbrtchx

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Wed Feb 15 13:50:01 2023
    On Wed, Feb 15, 2023 at 12:09:28PM +0000, Albretch Mueller wrote:
    On 2/15/23, DdB <debianlist@potentially-spam.de-bruyn.de> wrote:
    $ echo "Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\"" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
    Adams, Fred, and Ken Aizawa
    The Bounds of Cognition

    yes and this also works:

    _L="Adams, Fred, and Ken Aizawa \"The Bounds of Cognition\""
    echo "${_L}" | awk -F'\"' '{for (i=1; i<=NF; i++) print $i;}'
    Adams, Fred, and Ken Aizawa
    The Bounds of Cognition

    but I wasn't able to write the output into an array

    If you want to read LINES of a STREAM as array elements, use mapfile:

    mapfile -t myarray < <(
    printf '%s\n' "$stuff" | awk -F'\"' '...'
    )

    If you want to read FIELDS of a SINGLE LINE as array elements, use
    read -ra:

    read -ra myarray <<< "$one_line"

    Note the caveats associated with each of these, especially the second
    one. Very few things in bash ever work as you expect once you start
    poking at the corner cases.

    https://mywiki.wooledge.org/BashPitfalls#pf47

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albretch Mueller@21:1/5 to Greg Wooledge on Mon Feb 20 08:20:01 2023
    On 2/15/23, Greg Wooledge <greg@wooledge.org> wrote:
    If you want to read FIELDS of a SINGLE LINE as array elements, use
    read -ra:

    read -ra myarray <<< "$one_line"

    It didn't work. I tried different options. I am getting: "bash: read:
    ... : not a valid identifier"

    _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
    echo "// __ \$_PTH: \"${_PTH}\""

    # read -ra -d "\\|" _PTH_AR <<< "${_PTH}"
    # read -ra -d "\|" _PTH_AR <<< "${_PTH}"
    # read -ra -d "|" _PTH_AR <<< "${_PTH}"

    # read -ra -d '\\|' _PTH_AR <<< "${_PTH}"
    # read -ra -d '\|' _PTH_AR <<< "${_PTH}"
    # read -ra -d '|' _PTH_AR <<< "${_PTH}"

    _PTH_AR_L=${#_PTH_AR[@]}
    echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""

    The reason why I use pipes as field delimiter is because it is an
    excellent meta character when you are working with filesystems. Pipes
    would not accepted for files or directory names for good reasons,
    anyway.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Mon Feb 20 13:40:01 2023
    On Mon, Feb 20, 2023 at 07:10:11AM +0000, Albretch Mueller wrote:
    On 2/15/23, Greg Wooledge <greg@wooledge.org> wrote:
    If you want to read FIELDS of a SINGLE LINE as array elements, use
    read -ra:

    read -ra myarray <<< "$one_line"

    It didn't work. I tried different options. I am getting: "bash: read:
    ... : not a valid identifier"

    _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
    echo "// __ \$_PTH: \"${_PTH}\""

    # read -ra -d "\\|" _PTH_AR <<< "${_PTH}"
    # read -ra -d "\|" _PTH_AR <<< "${_PTH}"
    # read -ra -d "|" _PTH_AR <<< "${_PTH}"

    The -a option has to be followed by the array name. The -d option has
    to be followed by the delimiter.

    However, you do NOT want -d "|" here. The -d delimiter tells read
    where to stop reading entirely. For you, that's the newline character,
    which is the default for read, and which is added by the <<< operator.

    If you wish to do field splitting when using read, that's what IFS is
    for. However, beware of the atrociously stupid pitfall regarding IFS
    with non-whitespace values.

    unicorn:~$ _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
    unicorn:~$ declare -p _PTH
    declare -- _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf"
    unicorn:~$ IFS="|" read -ra _PTH_AR <<< "${_PTH}|"
    unicorn:~$ declare -p _PTH_AR
    declare -a _PTH_AR=([0]="83847547" [1]="2" [2]="dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf")

    That, I believe, is what you were trying to accomplish. Note that I
    added a trailing | character on the <<< "${_PTH}|" command. That's
    because of this pitfall:

    https://mywiki.wooledge.org/BashPitfalls#pf47

    Now we just need to teach you to stop using _ALL_CAPS variable names, especially ones with leading underscores.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albretch Mueller@21:1/5 to All on Mon Feb 20 20:30:01 2023
    https://mywiki.wooledge.org/BashPitfalls#pf47

    what I am trying to do is split a string using as delimiter a pipe. I
    used to do that with awk, but it doesn't work anymore after someone
    had the great idea of substituting awk with mawk, it seems; and Hey!
    They could have done it with python!:

    $ which awk
    /usr/bin/awk

    $ which mawk
    /usr/bin/mawk

    $ awk -W version
    mawk 1.3.4 20200120
    Copyright 2008-2019,2020, Thomas E. Dickey
    Copyright 1991-1996,2014, Michael D. Brennan

    random-funcs: srandom/random
    regex-funcs: internal
    compiled limits:
    sprintf buffer 8192
    maximum-integer 2147483647

    $ mawk -W version
    mawk 1.3.4 20200120
    Copyright 2008-2019,2020, Thomas E. Dickey
    Copyright 1991-1996,2014, Michael D. Brennan

    random-funcs: srandom/random
    regex-funcs: internal
    compiled limits:
    sprintf buffer 8192
    maximum-integer 2147483647
    $

    How do you split a string using as delimiter a pipe these days
    without using a bloody hack?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Mon Feb 20 20:50:01 2023
    On Mon, Feb 20, 2023 at 07:24:01PM +0000, Albretch Mueller wrote:
    https://mywiki.wooledge.org/BashPitfalls#pf47

    what I am trying to do is split a string using as delimiter a pipe

    The web page you cited tells you how, doesn't it? Assuming your string
    is a line (e.g. something you pulled out of a *simplified* CSV file,
    where there are no delimiters inside fields), and that you want to store
    the fields in an array, you can simply do:

    IFS="|" read -ra myarray <<< "$mystring|"

    Demonstration:

    unicorn:~$ mystring='foo|bar|last|field|is|empty|'
    unicorn:~$ IFS="|" read -ra myarray <<< "$mystring|"
    unicorn:~$ declare -p myarray
    declare -a myarray=([0]="foo" [1]="bar" [2]="last" [3]="field" [4]="is" [5]="empty" [6]="")

    I used to do that with awk,

    I don't understand how awk helps you populate the elements of a bash
    array. Awk can write a new string to stdout, but then you still have
    to parse that string in bash...? I don't see what benefit awk gives
    you here.

    How do you split a string using as delimiter a pipe these days
    without using a bloody hack?

    You cited a bash web page. So, everything you're doing is a hack.
    That's the nature of bash.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Albretch Mueller on Mon Feb 20 22:30:01 2023
    On Mon, Feb 20, 2023 at 09:12:08PM +0000, Albretch Mueller wrote:
    However this would rightly split that line based on the pipe delimiter:

    $ echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}'
    83847547
    2
    dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf

    So you're just converting pipelines to newlines? You can do that with
    tr.

    tr '|' '\n'

    There should be a sane way ;-) to feed those three lines into a bash array.

    mapfile -t myarray < <(...)

    But calling multiple processes just to split *one* line of input
    is rather inefficient.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albretch Mueller@21:1/5 to All on Mon Feb 20 22:20:01 2023
    Thank you! I noticed my mistake and yes, once again it was a hack
    which I thought to be a typo. I had removed the pipe you had included
    in the last part of the input string!: "${_PTH}|"

    _PTH="83847547|2|dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf" IFS="|" read -ra _PTH_AR <<< "${_PTH}|"
    _PTH_AR_L=${#_PTH_AR[@]}
    echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""
    for(( IX=0; IX<${_PTH_AR_L}; ++IX )); do
    echo "// __ [$IX/$_PTH_AR_L): |${_PTH_AR[$IX]}|"
    done

    // __ $_PTH_AR_L: |3|, "83847547|2|dli.ernet.449320/449320-Seduction
    Of The Innocent_text.pdf"
    // __ [0/3): |83847547|
    // __ [1/3): |2|
    // __ [2/3): |dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf|

    With awk I just to do such things like this:

    _PTH_AR=($( echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}' )) echo "// __ \$_PTH_AR_L: |${_PTH_AR_L}|, \"${_PTH}\""
    // __ $_PTH_AR_L: |1|, "83847547|2|dli.ernet.449320/449320-Seduction
    Of The Innocent_text.pdf"

    However this would rightly split that line based on the pipe delimiter:

    $ echo "${_PTH}" | awk -F '|' '{for (i=1; i<=NF; i++) print $i;}'
    83847547
    2
    dli.ernet.449320/449320-Seduction Of The Innocent_text.pdf
    $

    There should be a sane way ;-) to feed those three lines into a bash array.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Woodall@21:1/5 to Albretch Mueller on Tue Feb 21 06:20:01 2023
    On Mon, 20 Feb 2023, Albretch Mueller wrote:

    On 2/15/23, Greg Wooledge <greg@wooledge.org> wrote:

    The reason why I use pipes as field delimiter is because it is an
    excellent meta character when you are working with filesystems. Pipes
    would not accepted for files or directory names for good reasons,
    anyway.


    tim@einstein(7):~ (none)$ touch 'i|use|pipes'
    tim@einstein(7):~ (none)$ ls -l i*use*
    -rw-rw-r-- 1 tim tim 0 Feb 21 05:14 'i|use|pipes'
    tim@einstein(7):~ (none)$ rm i\|use\|pipes
    tim@einstein(7):~ (none)$

    AFAIR only / and nul are prohibited in file names.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Tim Woodall on Tue Feb 21 13:20:01 2023
    On Tue, Feb 21, 2023 at 05:19:13AM +0000, Tim Woodall wrote:
    On Mon, 20 Feb 2023, Albretch Mueller wrote:

    On 2/15/23, Greg Wooledge <greg@wooledge.org> wrote:

    The reason why I use pipes as field delimiter is because it is an
    excellent meta character when you are working with filesystems. Pipes
    would not accepted for files or directory names for good reasons,
    anyway.


    tim@einstein(7):~ (none)$ touch 'i|use|pipes'
    tim@einstein(7):~ (none)$ ls -l i*use*
    -rw-rw-r-- 1 tim tim 0 Feb 21 05:14 'i|use|pipes'
    tim@einstein(7):~ (none)$ rm i\|use\|pipes
    tim@einstein(7):~ (none)$

    AFAIR only / and nul are prohibited in file names.

    In Unix-like file systems, including Debian's default ext4, this is true.

    I have a funny feeling Albretch might be using Microsoft file systems
    (FAT, NTFS) for a large chunk of his system. Those have a much larger
    set of restricted characters.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Albretch Mueller@21:1/5 to Greg Wooledge on Tue Feb 21 20:20:02 2023
    On 2/21/23, Greg Wooledge <greg@wooledge.org> wrote:
    I have a funny feeling Albretch might be using Microsoft file systems
    (FAT, NTFS) for a large chunk of his system. Those have a much larger
    set of restricted characters.

    Certainly not FAT32 and definitely not FAT, but at work (I work as a
    Math teacher and most schools use Microsoft) I have had to use WSL and
    NTFS. I always thought that FSs used length-defined raster data
    structures in order to avoid messing with points and such things.
    lbrtchx

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve McIntyre@21:1/5 to lbrtchx@gmail.com on Tue Feb 21 22:40:01 2023
    lbrtchx@gmail.com wrote:
    On 2/21/23, Greg Wooledge <greg@wooledge.org> wrote:
    I have a funny feeling Albretch might be using Microsoft file systems
    (FAT, NTFS) for a large chunk of his system. Those have a much larger
    set of restricted characters.

    Certainly not FAT32 and definitely not FAT, but at work (I work as a
    Math teacher and most schools use Microsoft) I have had to use WSL and
    NTFS. I always thought that FSs used length-defined raster data
    structures in order to avoid messing with points and such things.

    Different filesystems can vary massively here, you can't really assume anything. All of the following can vary in filesystems supported by
    Linux:

    * allowed characters in filenames
    * allowed filename lengths
    * allowed full-path lengths
    * character encodings for filenames
    * case-sensitivity
    * max number of files per directory
    * max number of files per filesystem
    * timestamps (minimum, maximum and resolution)
    * support for symlinks and hardlinks
    * support for extended attributes, permissions and and ACLs
    * ...

    The VFS layer does a very good job of hiding the complexity and giving
    you a reasonably consistent view, but it's not difficult to find edges
    if you look. :-)

    --
    Steve McIntyre, Cambridge, UK. steve@einval.com < sladen> I actually stayed in a hotel and arrived to find a post-it
    note stuck to the mini-bar saying "Paul: This fridge and
    fittings are the correct way around and do not need altering"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)