• GAWK: When can we set ARGC=1 ?

    From Kenny McCormack@21:1/5 to All on Fri Feb 5 01:57:32 2021
    I sometimes use the technique of passing in data via the command line in
    the place normally reserved for filename args. Yes, I realize this is non-standard, and that there are other ways to do it. I'm not interested
    in any arguments or suggestions about those alternatives. The technique is something like:

    $ someCommand | gawk 'BEGIN { ARGC = 1 }
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    The trick here is that you explicitly set ARGC to 1, so that your strings
    don't get interpreted as filenames. Written as above, it all works fine.
    As long as you "kill" ARGV via setting ARGC in the BEGIN clause, it works
    as expected.

    Now, just for fun, I was playing around with some alternatives, and found
    that neither of the following variations work (and by "not work", I mean
    that it tries to interpret "string 1" as a filename, which of course fails
    and causes a fatal error abort from the program.

    1) $ someCommand | gawk -v ARGC=1 '
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    2) $ someCommand | gawk '
    /something/ { for (i in ARGV) print i,ARGV[i] }' ARGC=1 'string 1' 'string 2' ...

    I'm curious as to why neither of these work. To my mind, it seems they should.

    (Particularly, the first one; I can sort of get why the second one might
    not work)

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Pedantic

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Fri Feb 5 13:16:54 2021
    On 05.02.2021 02:57, Kenny McCormack wrote:
    I sometimes use the technique of passing in data via the command line in
    the place normally reserved for filename args. Yes, I realize this is non-standard, and that there are other ways to do it. I'm not interested
    in any arguments or suggestions about those alternatives. The technique is something like:

    $ someCommand | gawk 'BEGIN { ARGC = 1 }
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    The trick here is that you explicitly set ARGC to 1, so that your strings don't get interpreted as filenames. Written as above, it all works fine.
    As long as you "kill" ARGV via setting ARGC in the BEGIN clause, it works
    as expected.

    Now, just for fun, I was playing around with some alternatives, and found that neither of the following variations work (and by "not work", I mean
    that it tries to interpret "string 1" as a filename, which of course fails and causes a fatal error abort from the program.

    1) $ someCommand | gawk -v ARGC=1 '
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    2) $ someCommand | gawk '
    /something/ { for (i in ARGV) print i,ARGV[i] }' ARGC=1 'string 1' 'string 2' ...

    I'm curious as to why neither of these work. To my mind, it seems they should.

    (Particularly, the first one; I can sort of get why the second one might
    not work)

    I wouldn't expect anything here. While the three variants seem to do
    the same they are obviously and effectively triggered at different
    "instances of time". Because of that I get different error messages
    for the two error cases. So it depends on when the file-open command
    is issued and when it is determined whether files are present or not.

    Has the GNU Awk manual nothing to say about the processing order?
    Does POSIX specify anything about it?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Janis Papanagnou on Fri Feb 5 13:35:30 2021
    Janis Papanagnou wrote:

    On 05.02.2021 02:57, Kenny McCormack wrote:
    I sometimes use the technique of passing in data via the command line in
    the place normally reserved for filename args. Yes, I realize this is
    non-standard, and that there are other ways to do it. I'm not interested
    in any arguments or suggestions about those alternatives. The technique is >> something like:

    $ someCommand | gawk 'BEGIN { ARGC = 1 }
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    The trick here is that you explicitly set ARGC to 1, so that your strings
    don't get interpreted as filenames. Written as above, it all works fine.
    As long as you "kill" ARGV via setting ARGC in the BEGIN clause, it works
    as expected.

    Now, just for fun, I was playing around with some alternatives, and found
    that neither of the following variations work (and by "not work", I mean
    that it tries to interpret "string 1" as a filename, which of course fails >> and causes a fatal error abort from the program.

    1) $ someCommand | gawk -v ARGC=1 '
    /something/ { for (i in ARGV) print i,ARGV[i] }' 'string 1' 'string 2' ...

    2) $ someCommand | gawk '
    /something/ { for (i in ARGV) print i,ARGV[i] }' ARGC=1 'string 1' 'string 2' ...

    I'm curious as to why neither of these work. To my mind, it seems they should.

    (Particularly, the first one; I can sort of get why the second one might
    not work)

    I wouldn't expect anything here. While the three variants seem to do
    the same they are obviously and effectively triggered at different
    "instances of time". Because of that I get different error messages
    for the two error cases. So it depends on when the file-open command
    is issued and when it is determined whether files are present or not.

    Has the GNU Awk manual nothing to say about the processing order?
    Does POSIX specify anything about it?

    There are some clarifications about ARGC and ARGV planned for the
    next revision of POSIX. See:

    https://austingroupbugs.net/view.php?id=974#c3231

    One of the things the new description says is "It is unspecified
    whether alterations to ARGC can be made using the -v option."

    However, for the second command, unless I missed something I think it
    is (will be) required to work.

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to Aharon Robbins on Mon Feb 8 10:27:54 2021
    In article <rvqjff$1vpp$1@gioia.aioe.org>,
    Aharon Robbins <arnold@skeeve.com> wrote:
    In article <26uveh-7cm.ln1@ID-313840.user.individual.net>,
    Geoff Clare <netnews@gclare.org.uk> wrote:
    Has the GNU Awk manual nothing to say about the processing order?
    Does POSIX specify anything about it?

    There are some clarifications about ARGC and ARGV planned for the
    next revision of POSIX. See:

    https://austingroupbugs.net/view.php?id=974#c3231

    One of the things the new description says is "It is unspecified
    whether alterations to ARGC can be made using the -v option."

    However, for the second command, unless I missed something I think it
    is (will be) required to work.

    Thanks for this link and info. I will be reviewing what it says
    and if necessary I will fix gawk to take this into account.

    Brian Kernighan's awk "correctly" handles the case where ARGC=1 appears
    in place of a filename. Mawk and gawk don't. I haven't yet tried
    any other awks.

    Interestingly enough, this used to work. It broke at gawk 4.2.0 with
    the addition of MPFR. Below is a patch that will eventually make
    its way into the Git repo.

    Arnold
    -----------------------------
    diff --git a/io.c b/io.c
    index c1007423..08ea3c16 100644
    --- a/io.c
    +++ b/io.c
    @@ -520,6 +520,9 @@ nextfile(IOBUF **curfile, bool skipping)

    return ++i; /* run beginfile block */
    }
    +
    + // could have had ARGC=xx on command line. sigh.
    + argc = get_number_si(ARGC_node->var_value);
    }

    if (files == false) {
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aharon Robbins@21:1/5 to netnews@gclare.org.uk on Mon Feb 8 05:50:39 2021
    In article <26uveh-7cm.ln1@ID-313840.user.individual.net>,
    Geoff Clare <netnews@gclare.org.uk> wrote:
    Has the GNU Awk manual nothing to say about the processing order?
    Does POSIX specify anything about it?

    There are some clarifications about ARGC and ARGV planned for the
    next revision of POSIX. See:

    https://austingroupbugs.net/view.php?id=974#c3231

    One of the things the new description says is "It is unspecified
    whether alterations to ARGC can be made using the -v option."

    However, for the second command, unless I missed something I think it
    is (will be) required to work.

    Thanks for this link and info. I will be reviewing what it says
    and if necessary I will fix gawk to take this into account.

    Brian Kernighan's awk "correctly" handles the case where ARGC=1 appears
    in place of a filename. Mawk and gawk don't. I haven't yet tried
    any other awks.

    Arnold
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)