• Using a C struct to represent a node in a parse tree ... how many field

    From Roger L Costello@21:1/5 to All on Fri Apr 8 11:17:01 2022
    Hi Folks,

    The compiler book [1] that I am reading says this:

    The most obvious way to represent the information gained from lexical and syntax analysis is as a tree along the lines of the parse tree. In C this is suitably handled using a simple struct for each node:

    struct node
    {
    int nodetype ;
    struct node *field1 ;
    struct node *field2 ;
    struct node *field3 ;
    struct node *field4 ;
    struct node *field5 ;
    } ;

    But, but, but, ..what if a node requires more than 5 fields; how is that handled?

    /Roger

    [1] Introduction to Compiling Techniques, A First Course Using ANSI C, LEX and YACC by J. P. Bennett, page 47.
    [Use a bigger struct. You know what kind of nodes your parser creates, so define
    a struct that does what you want. Personally, I prefer to define a struct for each node type and a general node that is a union, or perhaps a union of pointers. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann Klammer@21:1/5 to Roger L Costello on Fri Apr 8 20:44:48 2022
    On 04/08/2022 01:17 PM, Roger L Costello wrote:
    Hi Folks,

    The compiler book [1] that I am reading says this:

    The most obvious way to represent the information gained from lexical and syntax analysis is as a tree along the lines of the parse tree. In C this is suitably handled using a simple struct for each node:

    struct node
    {
    int nodetype ;
    struct node *field1 ;
    struct node *field2 ;
    struct node *field3 ;
    struct node *field4 ;
    struct node *field5 ;
    } ;

    But, but, but, ..what if a node requires more than 5 fields; how is that handled?

    /Roger

    [1] Introduction to Compiling Techniques, A First Course Using ANSI C, LEX and
    YACC by J. P. Bennett, page 47.
    [Use a bigger struct. You know what kind of nodes your parser creates, so define
    a struct that does what you want. Personally, I prefer to define a struct for
    each node type and a general node that is a union, or perhaps a union of pointers. -John]

    doesn't happen.
    e.g. if/else is always 1 or two nodes. if they're deeper nested
    it ends up in subnodes. as long as you use flex/yacc you'll just have to look what's the highest number of nonterminals in your ruleset. and that's the number
    of fields in your struct, possibly plus an identifying enum and usually some union for the terminal symbols(as they won't have subnodes)
    if you're overly anal you can prolly dynamic allocate those pointers.

    struct node
    {
    int nodetype ;
    node *fields[0];
    } ;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Fri Apr 8 19:49:10 2022
    On 2022-04-08, Roger L Costello <costello@mitre.org> wrote:
    Hi Folks,

    The compiler book [1] that I am reading says this:

    The most obvious way to represent the information gained from lexical and syntax analysis is as a tree along the lines of the parse tree. In C this is suitably handled using a simple struct for each node:

    struct node
    {
    int nodetype ;
    struct node *field1 ;
    struct node *field2 ;
    struct node *field3 ;
    struct node *field4 ;
    struct node *field5 ;
    } ;

    But, but, but, ..what if a node requires more than 5 fields; how is that handled?

    The fields are all nodes so you can easily have a linked list:

    struct node
    {
    int nodetype;
    struct node *car;
    struct node *cdr;
    }

    The Lisp family of languages handle all syntax using nothing but binary
    cells like this, including nodes with arbitrary numbers of arguments.


    You can also also have an N-ary tree, compactly allocated, using
    the C struct hack (which is now official and called "flexible array
    member"):

    struct node {
    int nodetype;
    int nfields;
    struct node* fields[];
    };

    When the node is allocated, extra room is requested for required
    number of fields, which are then accessed as nodeptr->field[0],
    and so on.

    If it's ever necessary to dynamically grow that after a node has
    come into existence, and is referenced in multiple places in
    the program, the above representation has disadvantages.

    But you can easily switch to:

    struct node {
    int nodetype;
    int nfields;
    struct node** fields;
    };

    where fields is a separately allocated array of pointers. The
    access expressions look the same.

    nodeptr->field[0]

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From marblypup@yahoo.co.uk@21:1/5 to All on Sat Apr 9 05:03:55 2022
    Hmm! The last time I wrote a (simple) compiler, it never occurred to me to use any of those methods! Here's my code (with the 'payload' removed from the struct and new_node):
    I guess Johann would say I'm being extremely anal :-D

    struct tree {
    enum token_type type;
    struct tree *child, *sibling;
    };

    static struct tree *new_node(int type) {
    struct tree *res=malloc_or_bomb(sizeof *res);
    res->type=type;
    res->child=res->sibling=NULL;
    return res;
    }

    static struct tree *add_child(struct tree *parent, struct tree *child) {
    /* Add child to end of parent's list of children. */
    struct tree **p;
    for (p=&parent->child; NULL!=*p; p=&(*p)->sibling)
    ;
    *p=child;
    return child;
    }

    static struct tree *add_first_child(struct tree *parent, struct tree *child) {
    /* Add child to start of parent's list of children. */
    child->sibling=parent->child;
    parent->child=child;
    return child;
    }

    (I think I pinched the child/sibling technique for storing trees with arbitrary numbers of children from an Infocom game!)
    [The child/sibling approach is the way Lisp lists have worked since the last 1950s.
    For reasons related to the architecture of the IBM 709, they're usually spelled CAR and CDR. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to Kaz Kylheku on Wed Apr 27 21:22:51 2022
    On Friday, April 8, 2022 at 4:09:26 PM UTC-5, Kaz Kylheku wrote:
    On 2022-04-08, Roger L Costello <cost...@mitre.org> wrote:
    Hi Folks,

    The compiler book [1] that I am reading says this:

    The most obvious way to represent the information gained from lexical and syntax analysis is as a tree along the lines of the parse tree. In C this is
    suitably handled using a simple struct for each node:

    struct node
    {
    int nodetype ;
    struct node *field1 ;
    struct node *field2 ;
    struct node *field3 ;
    struct node *field4 ;
    struct node *field5 ;
    } ;

    But, but, but, ..what if a node requires more than 5 fields; how is that handled?
    The fields are all nodes so you can easily have a linked list:

    struct node
    {
    int nodetype;
    struct node *car;
    struct node *cdr;
    }

    The Lisp family of languages handle all syntax using nothing but binary
    cells like this, including nodes with arbitrary numbers of arguments.


    This is mostly how I do it. I've just started a new draft in C, so it's the appropriate size (and scope) to share as an example. One trick I've found useful in previous versions is the extra 'data' member in the symbol struct.
    If the output of the tokenizer is a stream of symbols, then the data for each symbol can be something useful like the original string representation and/or file position.


    #define PC10OBJECT_H

    typedef union object object;
    typedef object list;
    typedef object suspension;
    typedef object parser;
    typedef object boolean;
    typedef object operator;
    typedef operator predicate;
    typedef object fSuspension( object );
    typedef object fParser( object, list );
    typedef object fOperator( object, object );
    typedef boolean fPredicate( object, object );
    typedef object fBinaryOperator( object, object );

    typedef enum {
    INVALID, INT, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID
    } tag;

    union object { tag t;
    struct { tag t; int i; } Int;
    struct { tag t; struct list *ref; } List;
    struct { tag t; struct suspension *ref; } Suspension;
    struct { tag t; struct parser *ref; } Parser;
    struct { tag t; struct operator *ref; } Operator;
    struct { tag t; struct symbol *ref; } Symbol;
    struct { tag t; struct string *ref; } String;
    struct { tag t; void *ptr; } Void;
    };

    struct list {
    object a, b; // or car and cdr if you like
    };

    struct suspension {
    object env; // an association list of (<symbol>, <value>) pairs
    fSuspension *f;
    };

    struct parser {
    object env;
    fParser *f;
    };

    struct operator {
    object env;
    fOperator *f;
    };

    struct symbol {
    int code;
    char *printname;
    object data;
    };

    struct string {
    char *ptr;
    int disposable; // if yes, the GC should call free(ptr)
    };

    extern boolean T_, NIL_;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)