• On parsers, yet again

    From Johann 'Myrkraverk' Oskarsson@21:1/5 to All on Wed Feb 16 01:36:01 2022
    Dear r.a.i-f,

    I have been wondering, how are IF parsers generally constructed? Is
    there literature on this topic? As in, is it more like programming
    language parsing, for which there's abundant literature in compilers,
    or is it more like natural language parsing, which I guess is slightly different? Or neither?

    For creating a game, I would probably use TADS, or Inform 6, or some
    other ready made environment for exactly that. However, I have been
    wondering if parsers are really that /hard/ to do, or just more like
    /annoying/ to make?

    Anyone here to share anything on the subject?

    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann 'Myrkraverk' Oskarsson@21:1/5 to Greg Ewing on Wed Feb 16 13:02:51 2022
    On 2/16/2022 12:19 PM, Greg Ewing wrote:
    On 16/02/22 2:36 pm, Johann 'Myrkraverk' Oskarsson wrote:
    I have been wondering, how are IF parsers generally constructed?  Is
    there literature on this topic?  As in, is it more like programming
    language parsing, for which there's abundant literature in compilers,
    or is it more like natural language parsing, which I guess is slightly
    different?  Or neither?

    In my experience they're much more like programming language
    parsers than natural language parsers. IF input languages are
    usually a very restricted subset of natural languages, so you
    don't tend to have the same problems of vagueness and ambiguity
    that you get when trying to parse natural languages.

    Right.

    I have been
    wondering if parsers are really that /hard/ to do, or just more like
    /annoying/ to make?

    They're not really hard, especially if you have some familiarity
    with the techniques used for parsing programming languages. In
    fact, IF input languages are usually a lot simpler than typical
    programming languages. Most of the complexity comes in figuring
    out what to *do* in response to what the player typed.

    I see. I have to say I'm not /very familiar/ with parsing programming languages, however, recently I have been reading several compiler books,
    and I think I'm starting to get -- at least some of -- it. [*]

    Then I was thinking, if all of this has been written about compilers,
    hasn't /something/ been written about IF parsers? Maybe it hasn't
    and it's all in the compiler literature? One thing is different, IME,
    in IF, and that's the game itself can add keywords and nouns. Though
    maybe that's not too different from adding types in languages like C++.
    The difference being that the compiler grammar is /fixed/ while the IF
    grammar is more flexible with verbs being added and nouns changing as
    the game progresses.

    [*] To name two, /Modern Compiler Implementation in ML/ by Appel, and
    /Compiler Design in C/ by Holub. The latter is available on the
    author's website as pdf. Then I have a usable familiarity with flex
    and yacc.

    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Johann 'Myrkraverk' Oskarsson on Thu Feb 17 01:19:57 2022
    On 16/02/22 2:36 pm, Johann 'Myrkraverk' Oskarsson wrote:
    I have been wondering, how are IF parsers generally constructed?  Is
    there literature on this topic?  As in, is it more like programming
    language parsing, for which there's abundant literature in compilers,
    or is it more like natural language parsing, which I guess is slightly different?  Or neither?

    In my experience they're much more like programming language
    parsers than natural language parsers. IF input languages are
    usually a very restricted subset of natural languages, so you
    don't tend to have the same problems of vagueness and ambiguity
    that you get when trying to parse natural languages.

    I have been
    wondering if parsers are really that /hard/ to do, or just more like /annoying/ to make?

    They're not really hard, especially if you have some familiarity
    with the techniques used for parsing programming languages. In
    fact, IF input languages are usually a lot simpler than typical
    programming languages. Most of the complexity comes in figuring
    out what to *do* in response to what the player typed.

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam Thornton@21:1/5 to johann@myrkraverk.invalid on Wed Feb 16 18:34:07 2022
    In article <0C6PJ.1096642$X81f.741302@fx14.ams4>,
    Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> wrote:
    Then I was thinking, if all of this has been written about compilers,
    hasn't /something/ been written about IF parsers? Maybe it hasn't
    and it's all in the compiler literature? One thing is different, IME,
    in IF, and that's the game itself can add keywords and nouns. Though
    maybe that's not too different from adding types in languages like C++.
    The difference being that the compiler grammar is /fixed/ while the IF >grammar is more flexible with verbs being added and nouns changing as
    the game progresses.

    Maybe? But plenty of languages let you extend the syntax. FORTH is
    my favorite example, but anything LISP-like (and FORTH's stack is just
    a LISP expression stood up on end) encourages you to do exactly that.

    If you're not scared of wading through source...even though Inform 7
    isn't yet open-source, you can wade through its implementation of the
    parser and standard library, since that's written in Inform 6 and 7
    and bundled with the application.

    I'm working with the Mac app, so inside the Inform.app directory,
    you'd want to go to Contents/Resources. Linux and Windows will have
    analogous structures. Once inside there...Library/6.11 contains a
    bunch of Inform 6, including parserm.h, which contains the input
    tokenizer and parser. The I6 standard world model is in that
    directory as well. Going back up to Contents/Resources, and then down
    to Internal/Extensions/Graham\ Nelson will bring you to Standard\
    Rules.i7x, which is both the definition of the I7 standard model and
    the glue that binds it to I6.

    It's an enlightening read, if you want to see how the sausage is made.
    What you will find is what Greg Ewing said: the tokenizer is pretty straightforward, and the parser...recognizes a lot less than you think
    it might. The language extensibility is the cool bit, and Inform 7 is
    a really neat experiment in making extending the language -- which is
    to say, writing Interactive Fiction -- an awful lot like playing a
    game written in the language.

    Adam

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Johann 'Myrkraverk' Oskarsson on Fri Feb 18 13:37:31 2022
    On 17/02/22 2:02 am, Johann 'Myrkraverk' Oskarsson wrote:
    I have to say I'm not /very familiar/ with parsing programming
    languages, however, recently I have been reading several compiler books,
    and I think I'm starting to get -- at least some of -- it.

    Don't worry about getting deeply into the theory of parsing,
    most of it is overkill for this purpose.

    the game itself can add keywords and nouns.  Though
    maybe that's not too different from adding types in languages like C++.
    The difference being that the compiler grammar is /fixed/ while the IF grammar is more flexible with verbs being added and nouns changing as
    the game progresses.

    I'm not sure that's a helpful way to think about it. Rather than
    the grammar changing, it's more like different variables being in
    scope in different places in a program. The set of verbs, nouns,
    adjectives etc. understood by the game is fixed by the game author,
    but different objects become accessible at different times.

    What might be a bit different is that whereas in many programming
    languages you have reserved words such as "if", "while", etc. that
    can't be used for any other purpose, an IF parser needs to treat
    tokens more flexibly. E.g. if you decide that the word "plant"
    is always a noun so that you can have an object called "green plant",
    you're going to have trouble with a command like "plant the plant".

    For that reason you may find tools like yacc that are designed
    for keyword-oriented languages don't help very much.

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)