• TAWK: The spawn() function

    From Kenny McCormack@21:1/5 to All on Mon Feb 21 12:12:20 2022
    Here's a question for anybody who currently has access to the TAWK source
    code (i.e., the guys who periodically post here about having obtained and
    are maintaining a Windows version of TAWK) and anyone else knowledgeable on
    the subject.

    TAWK has the usual system() function which is, of course, basic AWK, but it also has a "spawn()" function, which is documented as running a program "directly", without invoking "the command interpreter" (in Windows-speak,
    this means CMD.EXE). But it is not very well documented, in terms of how exactly the given string is parsed. spawn() is documented as having 3
    forms:

    1) spawn(command)
    2) spawn(command,environment)
    3) spawn(command,environment,flags)

    Ignoring the later two in this list, the question is: How does "command"
    get parsed into separate cmd line arguments? Since CMD.EXE isn't involved,
    we don't expect things like redirection and piping to work, but it still
    has to break it up into separate args - at some point. Now, the
    documentation does say that under Unix, it just passes it on to whatever
    value is in $SHELL - so, you'd imagine that it ending up doing pretty much
    the same thing as system() does - effectively: sh -c 'command'. So,
    really, it sounds like there's no difference between spawn() and system(), under Unix.

    But under Windows, there *is* a difference. My sense is that it just
    passes "command" to CreateProcess(). I think (but I'm not really enough of
    a Windows API programmer to know) that CreateProcess() takes a single
    string argument and does its own parsing on it. So, the questions boils
    down to:
    1) Does it just pass it CreateProcess()?
    2) How well is it documented what CreateProcess() does?

    From my testing, it looks like it parses on spaces and quotes ("). So, something like:

    spawn("SomeThing \"THis and that\" and \"you\"\"me\"")

    runs Something with args:

    1) THis and that
    2) and
    3) you
    4) me

    Note how "you" and "me" were run together, but it still manages to split
    them. I found, interestingly enough, that it was OK to quote all of the
    args, but *not* to quote the command itself (i.e., Something). If I do:

    spawn("\"SomeThing\" \"THis and that\" and \"you\"\"me\"")

    It fails to launch.

    Note, BTW, that what I'd really like is an interface like the usual
    argv/argc interface - like in the exec*() functions in Unix, where you get
    to specify explicitly what goes into each argument.

    But again, my real, underlying question is: How well understood is it what Windows does? (When we use spawn() in TAWK under Windows)

    --
    Many (most?) Trump voters voted for him because they thought if they
    supported Trump enough, they'd get to *be* Trump.

    Similarly, Trump believes that if *he* praises Putin enough, he'll get to *be* Putin.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Anagnostopoulos@21:1/5 to Kenny McCormack on Tue Feb 22 04:30:45 2022
    On Monday, February 21, 2022 at 7:12:24 AM UTC-5, Kenny McCormack wrote:
    Here's a question for anybody who currently has access to the TAWK source code (i.e., the guys who periodically post here about having obtained and
    are maintaining a Windows version of TAWK) and anyone else knowledgeable on the subject.

    TAWK has the usual system() function which is, of course, basic AWK, but it also has a "spawn()" function, which is documented as running a program "directly", without invoking "the command interpreter" (in Windows-speak, this means CMD.EXE). But it is not very well documented, in terms of how exactly the given string is parsed. spawn() is documented as having 3
    forms:

    Yes, it uses createProcess() to spawn the process. Command line arguments are split apart in the usual way.

    ~~ Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to paul.anagnostopoulos1130@gmail.com on Tue Feb 22 14:11:14 2022
    In article <4c4f0b70-c7cf-4f8f-9a0c-0ef94118db41n@googlegroups.com>,
    Paul Anagnostopoulos <paul.anagnostopoulos1130@gmail.com> wrote:
    On Monday, February 21, 2022 at 7:12:24 AM UTC-5, Kenny McCormack wrote:
    Here's a question for anybody who currently has access to the TAWK source
    code (i.e., the guys who periodically post here about having obtained and
    are maintaining a Windows version of TAWK) and anyone else knowledgeable on >> the subject.

    TAWK has the usual system() function which is, of course, basic AWK, but it >> also has a "spawn()" function, which is documented as running a program
    "directly", without invoking "the command interpreter" (in Windows-speak,
    this means CMD.EXE). But it is not very well documented, in terms of how
    exactly the given string is parsed. spawn() is documented as having 3
    forms:

    Yes, it uses createProcess() to spawn the process. Command line arguments are >split apart in the usual way.

    I think the point is to define what is "the usual way".

    Is it done in the TAWK source, or is it just handed off to CreateProcess()?

    If the former, please comment. If the later, please point me to good documentation for how Windows does it.

    In particular, I need to know if any other character (other than space and double quote ["]) is special.

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Infallibility

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Kenny McCormack on Tue Feb 22 16:03:04 2022
    On 2022-02-21, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    But again, my real, underlying question is: How well understood is it what Windows does?

    It's perfectly well understood. In Windows, the command line is a single character string, which the target process must parse to retrieve
    arguments. This requires escaping in order to represent significant
    spaces, and literal escapes and so it goes. It creates problems due to
    subtle differences in the algorithms, in their handling of escapes.

    There is a de facto standard way of parsing, which is in the Microsoft
    Visual C run time: how C programs compiled with Microsoft tools (those
    that don't use WinMain) receive their ISO C main() call with arguments.

    In MSDN, under Microsoft's "C language reference" documentation, there is
    an article "Parsing C command-line arguments"

    Currently at this URL:

    https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170

    (Anyone seriously about this is probably behooved by hunting down the
    MSVCRT source code to see the actual implementation, but that's
    just me.)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to 480-992-1380@kylheku.com on Tue Feb 22 20:44:42 2022
    In article <20220222075404.580@kylheku.com>,
    Kaz Kylheku <480-992-1380@kylheku.com> wrote:
    On 2022-02-21, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    But again, my real, underlying question is: How well understood is it what >> Windows does?

    It's perfectly well understood. In Windows, the command line is a single >character string, which the target process must parse to retrieve
    arguments. This requires escaping in order to represent significant
    spaces, and literal escapes and so it goes. It creates problems due to >subtle differences in the algorithms, in their handling of escapes.

    Thanks for your response. It explains well the sorry state of the world.

    So, what it boils down to is that, as has always been the case with DOS,
    it is up to the application to parse the command line, and each/any
    application could do it differently.

    *If* the application was compiled with MSVC, then it should conform to the
    "de facto standard" that you've outlined, but "should" doesn't always mean "does".

    But, in any case, it seems clear that passing something like:

    "foo""bar"

    does get parsed as two args: foo & bar.

    It would be nice if there was a way to figure out which compiler was used
    to compile a given application - and from that, to determine exactly how
    that compiler does its parsing.

    --
    So to cure the problem of arrogant incompetent rich people we should turn
    the government over to an arrogant incompetent trust fund billionaire
    who knows nothing about government and who has never held a job in his
    entire spoiled life?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)