• Reverse Polish Notation Parser

    From Mike Sanders@21:1/5 to All on Mon Nov 6 02:26:38 2023
    Ping Janis: Question... Are you interested in rewriting this
    as a Gawk only implementation? Would be great for switch/case
    statements IMO. If so, I'll add your version to the file at
    my website.

    Folks, assuming Janis says yes, lets work off that version
    & use it as a baseline.

    At any rate, a work in progress...

    # tags: rpn, calc, numbers, awk, code
    #
    # awk reverse polish notation parser
    # Michael Sanders 2023
    # https://busybox.neocities.org/notes/rpn.txt
    #
    # operands...
    #
    # large numbers? this really depends on your awk
    # decimal fractions? yes, via a hopfully not too
    # clever regex that expects decimal points to be
    # surrounded by digits eg...
    #
    # valid 0.5, invalid .5
    # valid 5.0, invalid 5.
    #
    # operators...
    #
    # + addition
    # - subtraction
    # * multiplication
    # / division
    # % modulus
    # ^ exponentiation (see footnotes)
    #
    # *always* surround input with 'quotes' as some RPN
    # operators can be misconstrued as meta-characters
    # by your shell, example...
    #
    # echo '0.089 2.5 + 2 * 3 ^' | awk -f rpn.txt
    #
    # arbitrary precision using printf format specifier...
    #
    # %0.0f 1
    # %0.2f 1.00 (default)
    # %0.9f 1.000000000
    #
    # further reading...
    #
    # https://en.wikipedia.org/wiki/Reverse_Polish_notation

    { RPN($0) }

    function RPN(input, x, y, z, stack) {
    split(input, stack, /[[:space:]]+/)
    z = 0
    for (x = 1; x in stack; ++x) {
    y = stack[x]
    if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
    else {
    if (z < 2) { print "error: insufficient operands"; return }
    if (y == "+") stack[z-1] += stack[z]
    else if (y == "-") stack[z-1] -= stack[z]
    else if (y == "*") stack[z-1] *= stack[z]
    else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
    else if (y == "/") {
    if (stack[z] == 0) { print "error: division by zero"; return }
    stack[z-1] /= stack[z]
    } else if (y == "%") {
    if (stack[z] == 0) { print "error: modulo by zero"; return }
    stack[z-1] %= stack[z]
    } else { print "error: invalid operator " y; return }
    --z
    }
    }
    if (z != 1) { print "error: invalid RPN expression"; return }
    printf "%0.2f\n", stack[z]
    }

    # footnotes: exponentiation operators for differing awks...
    #
    # awk 'BEGIN {print 2 ^ 3}' should print 8 if using ^
    # stack[z-1] ^= stack[z] works on some awks using ^
    # stack[z-1] = stack[z-1] ^ stack[z] always works if using ^
    #
    # awk 'BEGIN {print 2 ** 3}' should print 8 if using **
    # stack[z-1] **= stack[z] works on some awks using **
    # stack[z-1] = stack[z-1] ** stack[z] always works if using **

    # eof

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Sanders@21:1/5 to Mike Sanders on Mon Nov 6 05:27:20 2023
    Mike Sanders <porkchop@invalid.foo> wrote:

    [...]

    quick update I've meaning to make...

    https://busybox.neocities.org/notes/rpn.txt

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Mike Sanders on Mon Nov 6 12:14:35 2023
    On 06.11.2023 03:26, Mike Sanders wrote:
    Ping Janis: Question... Are you interested in rewriting this
    as a Gawk only implementation? Would be great for switch/case
    statements IMO. If so, I'll add your version to the file at
    my website.

    This is no rocket science. If you have

    if (some_variable = "Val1") Cmd1 ;
    else if (some_variable = "Val2") Cmd2 ;
    ...
    else Def ;

    the switch statement (for simple string comparisons) looks like

    switch (some_variable) {
    case "Val1": Cmd1 ; break ;
    case "Val2": Cmd2 ; break ;
    ...
    default: Def ;
    }

    The point is that you have to provide and evaluate 'some_variable'
    just once. The "disadvantage" is that you will need the 'break'
    commands to not fall through to the next case (which is another
    common feature of such statements but not necessary in your
    case).

    You can apply 'switch' in function RPN() and errormessage(), but
    whether it makes sense or not to introduce 'switch' and sacrifice
    portability is on you to decide.

    GNU Awk has (as opposed to some other programming languages
    like C) the advantage that you can compare not only scalars;
    you can also compare strings and patterns in GNU Awk's switch:

    switch (some_variable) [
    case 42: ...
    case "string": ...
    case /pattern/: ...
    }

    I don't see any GNU Awk specific features/constructs that would
    suggest a [non-portable] transcription.


    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    Janis


    Folks, assuming Janis says yes, lets work off that version
    & use it as a baseline.

    At any rate, a work in progress...
    [...]

    { RPN($0) }

    function RPN(input, x, y, z, stack) {
    split(input, stack, /[[:space:]]+/)
    z = 0
    for (x = 1; x in stack; ++x) {
    y = stack[x]
    if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
    else {
    if (z < 2) { print "error: insufficient operands"; return }
    if (y == "+") stack[z-1] += stack[z]
    else if (y == "-") stack[z-1] -= stack[z]
    else if (y == "*") stack[z-1] *= stack[z]
    else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
    else if (y == "/") {
    if (stack[z] == 0) { print "error: division by zero"; return }
    stack[z-1] /= stack[z]
    } else if (y == "%") {
    if (stack[z] == 0) { print "error: modulo by zero"; return }
    stack[z-1] %= stack[z]
    } else { print "error: invalid operator " y; return }
    --z
    }
    }
    if (z != 1) { print "error: invalid RPN expression"; return }
    printf "%0.2f\n", stack[z]
    }
    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Sanders@21:1/5 to Janis Papanagnou on Mon Nov 6 12:12:14 2023
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    the switch statement (for simple string comparisons) looks like

    switch (some_variable) {
    case "Val1": Cmd1 ; break ;
    case "Val2": Cmd2 ; break ;
    ...
    default: Def ;
    }

    Can multiple cases be 'or'ed (combined) as in $SHELL?

    case q|Q) quit(); break;

    I don't see any GNU Awk specific features/constructs that would
    suggest a [non-portable] transcription.

    Ok, so no.

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    '&' I did not know this, must study it further.

    Thanks. Off to work for me.

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Tue Nov 7 05:48:34 2023
    On 11/6/2023 5:14 AM, Janis Papanagnou wrote:
    On 06.11.2023 03:26, Mike Sanders wrote:
    Ping Janis: Question... Are you interested in rewriting this
    as a Gawk only implementation? Would be great for switch/case
    statements IMO. If so, I'll add your version to the file at
    my website.

    This is no rocket science. If you have

    if (some_variable = "Val1") Cmd1 ;
    else if (some_variable = "Val2") Cmd2 ;
    ...
    else Def ;

    the switch statement (for simple string comparisons) looks like

    switch (some_variable) {
    case "Val1": Cmd1 ; break ;
    case "Val2": Cmd2 ; break ;
    ...
    default: Def ;
    }

    The point is that you have to provide and evaluate 'some_variable'
    just once. The "disadvantage" is that you will need the 'break'
    commands to not fall through to the next case (which is another
    common feature of such statements but not necessary in your
    case).

    You can apply 'switch' in function RPN() and errormessage(), but
    whether it makes sense or not to introduce 'switch' and sacrifice portability is on you to decide.

    GNU Awk has (as opposed to some other programming languages
    like C) the advantage that you can compare not only scalars;
    you can also compare strings and patterns in GNU Awk's switch:

    switch (some_variable) [
    case 42: ...
    case "string": ...
    case /pattern/: ...
    Nitpick - it's

    case /regexp/:

    rather than:

    case /pattern/

    The word "pattern" is ambiguous and misused all over awk documentation.
    You can make some argument for it in:

    pattern { action }

    since that includes `BEGIN`, integers, etc. I'd argue that should be:

    condition { action }

    but in the "case" statement above what goes inside `/.../` is simply and
    always a regexp.


    }

    I don't see any GNU Awk specific features/constructs that would
    suggest a [non-portable] transcription.


    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).
    You can write that, but it's not a capture group that can be
    backreferenced from the replacement and if it was "&" wouldn't refer to
    the string that matched ".*" anyway, it'd refer to the string that
    matched the whole regexp.

    You could use a capture group in GNU awk for gensub():

    str = gensub (/^[[:space:]]+(.*)[[:space:]]+$/, "\\1", 1, str)

    and in most awks you could do:

    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    but there are some out there that will fail to do both substitutions for
    that case (tawk and nawk maybe?) and so you need:

    sub(/^[[:space:]]+/, "", str)
    sub(/[[:space:]]+$/, "", str)

    for those but since this requires an awk that supports POSIX character
    classes anyway you could just declare the script as portable to all
    POSIX awks (rather than all "modern" awks) and not worry about ones that
    fail given the above gsub().

    Alternatively in any POSIX awk you could do:

    match(str,/[^[:space:]](.*[^[:space:]])?/)
    str = substr(str,RSTART,RLENGTH)

    and I expect that'd also work in any awk that supports POSIX character
    classes but does not support the above gsub().

    Regards,

    Ed.


    Janis


    Folks, assuming Janis says yes, lets work off that version
    & use it as a baseline.

    At any rate, a work in progress...
    [...]

    { RPN($0) }

    function RPN(input, x, y, z, stack) {
    split(input, stack, /[[:space:]]+/)
    z = 0
    for (x = 1; x in stack; ++x) {
    y = stack[x]
    if (y ~ /^[0-9]+(\.[0-9]+)?$/) stack[++z] = y
    else {
    if (z < 2) { print "error: insufficient operands"; return }
    if (y == "+") stack[z-1] += stack[z]
    else if (y == "-") stack[z-1] -= stack[z]
    else if (y == "*") stack[z-1] *= stack[z]
    else if (y == "^") stack[z-1] ^= stack[z] # see footnotes
    else if (y == "/") {
    if (stack[z] == 0) { print "error: division by zero"; return }
    stack[z-1] /= stack[z]
    } else if (y == "%") {
    if (stack[z] == 0) { print "error: modulo by zero"; return }
    stack[z-1] %= stack[z]
    } else { print "error: invalid operator " y; return }
    --z
    }
    }
    if (z != 1) { print "error: invalid RPN expression"; return }
    printf "%0.2f\n", stack[z]
    }
    [...]



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Ed Morton on Tue Nov 7 14:02:27 2023
    On 07.11.2023 12:48, Ed Morton wrote:
    On 11/6/2023 5:14 AM, Janis Papanagnou wrote:

    switch (some_variable) [
    case 42: ...
    case "string": ...
    case /pattern/: ...

    Nitpick - it's

    And I expected the nitpick on my typo using '=' (instead of '==') in
    the 'if' comparison. ;-)


    case /regexp/:

    rather than:

    case /pattern/

    The word "pattern" is ambiguous and misused all over awk documentation.

    You are right that the historic (and, sadly, surviving) documentation
    speaks about 'pattern' and '/regexp'/'.

    You can make some argument for it in:

    pattern { action }

    since that includes `BEGIN`, integers, etc. I'd argue that should be:

    condition { action }

    Decades ago I was the first one suggesting the term 'condition' here so
    you certainly don't need to teach me. (Myself I was using that term in
    my Awk courses even since the 1990's.)


    but in the "case" statement above what goes inside `/.../` is simply and always a regexp.

    Yes. Sorry for my sloppiness here. Thanks for the nitpick.


    [...]


    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    You can write that, but it's not a capture group that can be
    backreferenced from the replacement and if it was "&" wouldn't refer to
    the string that matched ".*" anyway, it'd refer to the string that
    matched the whole regexp.

    Using my Awk it does what advertised. I merely didn't find it clearly documented whether the '&' is generally guaranteed to refer to the
    grouping.


    You could use a capture group in GNU awk for gensub():

    I deliberately abstained from gensub() here since the OP avoids GNU
    Awk specifics.


    str = gensub (/^[[:space:]]+(.*)[[:space:]]+$/, "\\1", 1, str)

    and in most awks you could do:

    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for
    that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).


    sub(/^[[:space:]]+/, "", str)
    sub(/[[:space:]]+$/, "", str)
    [...]

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Mike Sanders on Tue Nov 7 13:36:35 2023
    On 06.11.2023 13:12, Mike Sanders wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    the switch statement (for simple string comparisons) looks like

    switch (some_variable) {
    case "Val1": Cmd1 ; break ;
    case "Val2": Cmd2 ; break ;
    ...
    default: Def ;
    }

    Can multiple cases be 'or'ed (combined) as in $SHELL?

    What's always possible is to use the same bulky way as other C based
    languages:

    case "q": case "Q":

    in case of comparing patterns I'd try

    case /q|Q/:

    (adding anchors ^ and $ as necessary). For details see [*].


    case q|Q) quit(); break;
    [...]

    Janis

    [*] https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Tue Nov 7 14:37:40 2023
    On 07.11.2023 14:02, Janis Papanagnou wrote:

    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    You can write that, but it's not a capture group that can be
    backreferenced from the replacement and if it was "&" wouldn't refer to
    the string that matched ".*" anyway, it'd refer to the string that
    matched the whole regexp.

    Using my Awk it does what advertised. I merely didn't find it clearly documented whether the '&' is generally guaranteed to refer to the
    grouping.

    I stand corrected, I had a wrong test case! - "&" does _not_ work
    on grouping, and as you said (and as it is specified), it generally
    defines the whole match. (It wouldn't make sense otherwise.) - Thanks!

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Mike Sanders on Tue Nov 7 17:24:38 2023
    On 2023-11-06, Mike Sanders <porkchop@invalid.foo> wrote:
    Ping Janis: Question... Are you interested in rewriting this
    as a Gawk only implementation? Would be great for switch/case
    statements IMO. If so, I'll add your version to the file at
    my website.

    The cppawk preprocessor supports a case macro which
    compiles to the switch statement for Gawk, or to portable
    Awk code.

    The macro is documented in its own man page:

    https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1

    case is safer than switch because it doesn't have implicit
    "fallthrough".

    Each case must end with one of: cbreak, cfall or cret: break,
    fallthrough or return.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Sanders@21:1/5 to Kaz Kylheku on Tue Nov 7 18:50:51 2023
    Kaz Kylheku <864-117-4973@kylheku.com> wrote:

    The cppawk preprocessor supports a case macro which
    compiles to the switch statement for Gawk, or to portable
    Awk code.

    The macro is documented in its own man page:

    https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1

    case is safer than switch because it doesn't have implicit
    "fallthrough".

    Each case must end with one of: cbreak, cfall or cret: break,
    fallthrough or return.

    Hey-hey Kaz.

    That's really nifty in fact. I might try my hand at a cppawk
    project just to familiarizes myself with its workings.

    Thanks & mucho appreciate the head's up kind sir, must study
    more about it all.

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Sanders@21:1/5 to Janis Papanagnou on Tue Nov 7 18:46:27 2023
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    [*] https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement

    Will read/study (as always thanks Janis!)

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Janis Papanagnou on Wed Nov 8 04:22:28 2023
    On 11/7/2023 7:02 AM, Janis Papanagnou wrote:
    On 07.11.2023 12:48, Ed Morton wrote:
    <snip>
    and in most awks you could do:

    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for
    that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).

    In my opinion it's just a bug. It was demonstrated to me when I posted
    an answer on Stack Overflow several years ago that I can't find right
    now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
    nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
    have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
    one that DOES have the problem), and/or busybox awk, none of which I
    have access to, so if anyone has and could test them by running:

    $ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'
    <foo>

    and let us know which don't produce that output, that'd be great.

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ed Morton@21:1/5 to Ed Morton on Wed Nov 8 05:41:01 2023
    On 11/8/2023 4:22 AM, Ed Morton wrote:
    On 11/7/2023 7:02 AM, Janis Papanagnou wrote:
    On 07.11.2023 12:48, Ed Morton wrote:
    <snip>
    and in most awks you could do:

        gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for >>> that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).

    In my opinion it's just a bug. It was demonstrated to me when I posted
    an answer on Stack Overflow several years ago that I can't find right
    now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
    nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
    have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
    one that DOES have the problem), and/or busybox awk, none of which I
    have access to, so if anyone has and could test them by running:

    $ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'    <foo>

    and let us know which don't produce that output, that'd be great.

        Ed.


    I found a different answer I had posted (also years ago) that mentions
    this bug and there I say it's tawk and mawk1 where it occurs. Still
    can't find the original where I was told about it (and it was in a
    discussion in comments that's probably been removed by now) but that
    should be good enough for others to reproduce it.

    The specific case was removing quotes from around a field in a CSV:

    $ printf '"foo"' | awk '{gsub(/^"+|"+$/,""); print "<" $0 ">"}'
    <foo>

    but I doubt if that detail matters.

    Regards,

    Ed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)