• Do people create parsers for command line arguments?

    From Roger L Costello@21:1/5 to All on Thu Jul 28 11:14:51 2022
    Hi Folks,

    I've seen some tools with pretty complicated arguments. The argument list is a language unto itself.

    Do people create parsers for command line arguments? Or is a parser overkill?

    /Roger
    [On Unix-ish systems, the shell breaks the command into space separated arguments, while
    the rest is up to each program. Many languages have argument handling libraries, typically
    recognizing arguments of various types such as switches, numbers, and filenames. Some shells
    like zsh have complicated command completion schemes which know as you type what each bit
    of a command is supposed to be so it can prompt you. See the zsh manpages for a very long
    discussion of how it works.
    Back in the olden days, Tenex had command completion built into the operating system which
    seemed pretty cool at the time. The manual should be in bitsavers. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Fri Jul 29 20:52:04 2022
    On 2022-07-28, Roger L Costello <costello@mitre.org> wrote:
    Hi Folks,

    I've seen some tools with pretty complicated arguments. The argument list is a
    language unto itself.

    Do people create parsers for command line arguments? Or is a parser overkill?

    It happens.

    You probably know these examples.

    The find utility has an expression syntax whose tokens are command line arguments. Parentheses are used for overriding precedence; they must be
    escaped in common shell languages, so they are passed through to find
    verbatim:

    find /etc \( -name '*.conf' -o -name '*.xml \) -exec command {} \:

    The [ command also parses expressions that are individual arguments:

    if [ $foo = $bar -o $n1 -gt $n2 ] ; then ...

    The tcpdump utility uses command line arguments as the tokens for
    pcap filter expressions. (Sort of). Example from man page:

    To print traffic between helios and either hot or ace:

    tcpdump host helios and \( hot or ace \)

    However, tcpdump can do its own splitting; the expression can be
    quoted as one argument.

    All of these programs must be parsing. They have phrase structures
    and operator precedence with parentheses right the command line.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Kaz Kylheku on Fri Jul 29 15:30:33 2022
    On Friday, July 29, 2022 at 2:49:38 PM UTC-7, Kaz Kylheku wrote:

    (snip)

    All of these programs must be parsing. They have phrase structures
    and operator precedence with parentheses right the command line.

    It seems to me that this is the important part. The simplest processing
    of a command line might not count as parsing.

    If one processes an arithmetic expression left to right, with no precedence, that might not count as parsing. Is two different precedence levels enough?

    My first thought of the question was for machine generated vs.
    hand written parsers. When does it get complicated enough to make
    it worth using a parser generator?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Giacinto Cifelli@21:1/5 to gah4@u.washington.edu on Mon Aug 8 18:25:18 2022
    in general the linux command line arguments are parsed through getopt(3) https://www.man7.org/linux/man-pages/man3/getopt.3.html

    and it is better not to implement special things on the command line,
    because it could be pre-parsed or expanded by the shell itself.

    then if you want to pass a string on the command line, that is
    different, but then again, it is better to take it from stdin.

    On Sat, Jul 30, 2022 at 1:00 AM gah4 <gah4@u.washington.edu> wrote:

    On Friday, July 29, 2022 at 2:49:38 PM UTC-7, Kaz Kylheku wrote:

    (snip)

    All of these programs must be parsing. They have phrase structures
    and operator precedence with parentheses right the command line.

    It seems to me that this is the important part. The simplest processing
    of a command line might not count as parsing.

    If one processes an arithmetic expression left to right, with no precedence, that might not count as parsing. Is two different precedence levels enough?

    My first thought of the question was for machine generated vs.
    hand written parsers. When does it get complicated enough to make
    it worth using a parser generator?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Giacinto Cifelli on Mon Aug 8 22:01:08 2022
    On Monday, August 8, 2022 at 8:17:26 PM UTC-7, Giacinto Cifelli wrote:
    in general the linux command line arguments are parsed through getopt(3) https://www.man7.org/linux/man-pages/man3/getopt.3.html

    VMS has its own fancy command line parser, mostly used for the
    built-in commands. Among others, it allows for abbreviating commands
    and command options, and I believe some checking on the arguments
    themselves.

    If you install programs with their own descriptor file, it integrates
    that with the existing parser.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Roger L Costello on Tue Aug 23 15:07:12 2022
    On Friday, July 29, 2022 at 1:36:47 PM UTC-7, Roger L Costello wrote:

    I've seen some tools with pretty complicated arguments. The argument list is a
    language unto itself.

    No more discussion of this for weeks now, so this is what it looks like for DCL.

    DCL is Digital Command Language, which DEC used for some systems,
    such as VAX/VMS. (And later VMS ports.)

    DCL allows for command options following a slash, and those options
    can have arguments of various types. There is a language for describing
    those called Command Definition Language, with file extension CLD.

    As it was the first one I found, here is the one for GAWK, or GNU Awk:

    ! Gawk.Cld -- command defintion for GAWK
    ! Pat Rankin, Nov'89
    ! [ revised for 2.12, May'91 ]
    ! [ revised for 4.0.0, Feb'11 ]
    !
    ! This command definition is compiled into an object module which is
    ! linked into all three programs, GAWK, DGAWK, and PGAWK, and it is
    ! not able to use syntax-switching qualifers to invoke the different
    ! images gawk.exe, dgawk.exe, and pgawk.exe. To use dgawk or pgawk
    ! when this command definition is installed as a native command, use
    ! $ define gawk location:dgawk.exe
    ! or $ define gawk location:pgawk.exe
    !
    module Gawk_Cmd
    define verb GAWK
    synonym AWK
    ! image gawk !usage $ DEFINE GAWK disk:[directory]GAWK
    parameter p1, value(required,list), label=gawk_p1, prompt="data file(s)"
    qualifier input, value(required,list,type=$infile), label=progfile
    qualifier commands, value(required), label=program
    qualifier extra_commands, value(required), label=moreprog
    qualifier field_separator, value(required), label=field_sep
    qualifier variables, value(required,list)
    qualifier usage
    qualifier copyright
    qualifier version
    qualifier lint, value(list,type=lint_keywords)
    qualifier posix
    qualifier strict, negatable !synonym for /traditional
    qualifier traditional, negatable
    qualifier re_interval, negatable !only used with /traditional
    qualifier sandbox
    qualifier debug, negatable !obsolete; debug via separate DGAWK program
    qualifier output, value(type=$outfile,default="SYS$OUTPUT")
    qualifier optimize, negatable !actually on always; negation is ignored
    qualifier profile, value(type=$outfile,default="awkprof.out")
    qualifier dump_variables, value(type=$outfile,default="awkvars.out")
    qualifier non_decimal_data
    qualifier characters_as_bytes
    qualifier use_lc_numeric
    qualifier gen_pot
    qualifier reg_expr, value(type=reg_expr_keywords) !(OBSOLETE)
    disallow progfile and program !or not progfile and not program
    !disallow lint.warn and (lint.fatal or lint.invalid)
    define type lint_keywords
    keyword warn, default
    keyword fatal !lint warnings terminate execution
    keyword invalid !warn about invalid constructs but not extensions
    keyword old !warn about constructs not available in original awk define type reg_expr_keywords
    keyword awk
    keyword egrep, default !synonym for 'posix'
    keyword posix !equivalent to 'egrep'
    !
    ! p1 = data file list (possibly including 'var=value' contructs)
    !note: parameter required; use 'sys$input:' to read data from 'stdin'
    ! /input = program source file ('-f progfile')
    ! /commands = program source text ('program')
    !note: either input or commands, but not both; if neither, usage message given ! /extra_commands = additional program source text; may be combined with /input ! /field_separator = character(s) delimiting record fields; default is "[ \t]" ! /reg_expr = obsolete
    ! /variables = list of 'var=value' items for assignment prior to BEGIN
    ! /posix = force POSIX compatability mode operation
    ! /sandbox = disable I/O redirection and use of system() to execute commands
    ! /strict = synonym for /traditional
    ! /traditional = force compatability mode operation (UN*X SYS V, Release 4)
    ! /re_interval = for /traditional, regular expressions allow interval ranges
    ! /output = destination for print,printf (default is sys$output: ie, 'stdout')
    ! /lint = scan the awk program for possible problems and warn about them
    ! /optimize = parse-time evaluation of constant [sub-]expressions only
    ! /debug = debugging mode; no-op unless program built using `#define DEBUG' ! /dump_var = at program termination, write out final values for all variables ! /profile = collect all parts of the parsed awk program into awkprof.out !note: use separate pgawk program to collect run-time execution profiling
    ! /usage = display 'usage' reminder [describing this VMS command syntax]
    ! /version = show program version and quit; also shows copyright notice
    ! /copyright = show abbreviated edition of FSF's copyright notice and quit
    !

    Two things then have to happen. This file is compiled into an object file that is
    linked into the executable (EXE) file itself. But also DCL needs to know
    about it. The file is compiled into a binary form, that is loaded by the DCL command parser. There is a system copy, for commands available to all users, and a separate one for each individual user.

    Among others, DCL allows for abbreviation of command names and
    options to the shortest unique prefix. I suspect that the compiled CLD
    file generates the tables needed to do that. So, all the command line
    parsing is done by DCL before it even starts running the program, and
    then the parsed command options are passed in (presumably) a convenient
    form for the program to use.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann 'Myrkraverk' Oskarsson@21:1/5 to Giacinto Cifelli on Thu Sep 29 13:16:13 2022
    On 8/8/2022 4:25 PM, Giacinto Cifelli wrote:
    in general the linux command line arguments are parsed through getopt(3) https://www.man7.org/linux/man-pages/man3/getopt.3.html

    and it is better not to implement special things on the command line,
    because it could be pre-parsed or expanded by the shell itself.

    then if you want to pass a string on the command line, that is
    different, but then again, it is better to take it from stdin.

    A much [?] younger me wrote about how to do this with reflex & byacc.


    http://www.myrkraverk.com/blog/2017/10/parsing-command-line-parameters-with-yacc-flex/

    I know these tools even better now, thanks in parts to reading more
    books on the subject, but I have not updated my blog.

    Note that before I wrote this, the "common knowledge" I could find
    online at the time, being a decade ago and older, was the need to
    first concatenate the strings, but that's simply wrong.

    Of course, using these tools easily requires prior knowledge, but
    for the people who do, it's trivial. I do not know who, if any,
    have actually used my template in production. I believe it's mostly
    students who are curious about this.

    [snip]

    Enjoy,
    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)