• Interaction between conditional inclusion and source file inclusion.

    From James Kuyper@21:1/5 to All on Sat Dec 25 01:18:30 2021
    This discussion requires familiarity with the standard's specifications
    and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got
    those sections memorized, you might need to cross-reference them to
    understand what I'm saying.

    To simplify the following discussion, I'm going to write it as if the
    only conditional inclusion preprocessing directives were #if, #else, and #endif. Code using the other conditional inclusion directives can always
    be rewritten to use only those three, with essentially the same
    behavior, with a minor exception in the case of #elsif, where the
    subsequent occurrences of __LINE__ would have increased values. Those
    other directives don't change the issue I'm discussing, they only
    complicate the discussion.

    I've long understood that, during translation phase 4, as soon as a
    compiler reaches the new-line at the end of a #if directive, it knows
    whether the #if group will be included. It not, and there's a
    corresponding #else, it knows that the #else group will be included.
    Either way, as soon as it starts reading a group that will be included,
    it can immediately start preprocessing that group (and this is the
    important part:) while searching for the #else or #endif directive that terminates the group.

    I've also long understood that the #if, #else (if any) and #endif
    directives that make up an if-section must all occur in the same file.
    I'm not sure how I reached that conclusion - it's not anything that the standard says explicitly.

    I just recently realized that, under certain circumstances, those two understandings are in conflict:

    if.c:
    #if 1
    int i = 0;
    #include "else.h"
    int l = 3;
    #endif

    else.h:
    int j = 1;
    #else
    int k = 2;

    If preprocessing of the #if group could continue while searching for the terminating #else or #endif, then that would mean that the #include
    directive in if.c would be replaced by the contents of else.h, and that
    the #else from else.h would therefore be recognized as terminating the
    if-group from if.c, and starting a new else-group that continues until
    the #endif in if.c. The declarations of `i` and `j` should be included,
    and those of `k` and `l` should be skipped.

    I didn't expect it to work, and my tests with gcc confirm that
    expectation - but I'm having trouble identifying how the standard
    specifies that this shouldn't work.

    The grammar for an if-group in 6.10p1 includes the following rule:

    # if constant-expression new-line group opt

    This could be interpreted as meaning that the entire if-group must be
    parsed as such by the compiler before carrying out the behavior
    associated with that if-group, which is to process the optional group if
    the constant-expression has a non-zero value. This would imply that the
    #else or #endif that terminates the group must be identified as such
    before replacing any #include directives that might be found in that
    group with the contents of the specified file. That in turn would imply
    that a #else in the included file could not qualify as that terminating directive.

    The thing is, It's not clear to me that the standard actually says so. C
    was designed around the same time I started my computer programming
    career, when keeping a program's memory footprint small was more
    important than it is now. I've noted that, particularly with the
    original version of the C standard, the language seems, for the most
    part, deliberately designed to allow single-pass processing with
    relatively low memory requirements, which is why I did not expect it to
    require scanning for the end of a group before processing any #include directives in that group.

    Have I missed something that says this more explicitly than the grammar
    rule cited above? I'm sure there are people who will tell me that the
    grammar rule cited above is sufficient, because they think it makes this
    point perfectly clear - but is there anyone who agrees with me that it's
    not clear?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to James Kuyper on Sat Dec 25 08:23:47 2021
    On 12/25/21 1:18 AM, James Kuyper wrote:
    This discussion requires familiarity with the standard's specifications
    and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got those sections memorized, you might need to cross-reference them to understand what I'm saying.

    To simplify the following discussion, I'm going to write it as if the
    only conditional inclusion preprocessing directives were #if, #else, and #endif. Code using the other conditional inclusion directives can always
    be rewritten to use only those three, with essentially the same
    behavior, with a minor exception in the case of #elsif, where the
    subsequent occurrences of __LINE__ would have increased values. Those
    other directives don't change the issue I'm discussing, they only
    complicate the discussion.

    I've long understood that, during translation phase 4, as soon as a
    compiler reaches the new-line at the end of a #if directive, it knows
    whether the #if group will be included. It not, and there's a
    corresponding #else, it knows that the #else group will be included.
    Either way, as soon as it starts reading a group that will be included,
    it can immediately start preprocessing that group (and this is the
    important part:) while searching for the #else or #endif directive that terminates the group.

    I've also long understood that the #if, #else (if any) and #endif
    directives that make up an if-section must all occur in the same file.
    I'm not sure how I reached that conclusion - it's not anything that the standard says explicitly.

    I just recently realized that, under certain circumstances, those two understandings are in conflict:

    if.c:
    #if 1
    int i = 0;
    #include "else.h"
    int l = 3;
    #endif

    else.h:
    int j = 1;
    #else
    int k = 2;

    If preprocessing of the #if group could continue while searching for the terminating #else or #endif, then that would mean that the #include
    directive in if.c would be replaced by the contents of else.h, and that
    the #else from else.h would therefore be recognized as terminating the if-group from if.c, and starting a new else-group that continues until
    the #endif in if.c. The declarations of `i` and `j` should be included,
    and those of `k` and `l` should be skipped.

    I didn't expect it to work, and my tests with gcc confirm that
    expectation - but I'm having trouble identifying how the standard
    specifies that this shouldn't work.

    The grammar for an if-group in 6.10p1 includes the following rule:

    # if constant-expression new-line group opt

    This could be interpreted as meaning that the entire if-group must be
    parsed as such by the compiler before carrying out the behavior
    associated with that if-group, which is to process the optional group if
    the constant-expression has a non-zero value. This would imply that the
    #else or #endif that terminates the group must be identified as such
    before replacing any #include directives that might be found in that
    group with the contents of the specified file. That in turn would imply
    that a #else in the included file could not qualify as that terminating directive.

    The thing is, It's not clear to me that the standard actually says so. C
    was designed around the same time I started my computer programming
    career, when keeping a program's memory footprint small was more
    important than it is now. I've noted that, particularly with the
    original version of the C standard, the language seems, for the most
    part, deliberately designed to allow single-pass processing with
    relatively low memory requirements, which is why I did not expect it to require scanning for the end of a group before processing any #include directives in that group.

    Have I missed something that says this more explicitly than the grammar
    rule cited above? I'm sure there are people who will tell me that the
    grammar rule cited above is sufficient, because they think it makes this point perfectly clear - but is there anyone who agrees with me that it's
    not clear?

    While the grammer may not be clear as to what happens between the start
    and end of the if-group, the description of what happens in the block
    says (6.10.1p6)

    Each directive’s condition is checked in order. If it evaluates to false (zero), the group that it controls is skipped: directives are processed
    only through the name that determines the directive in order to keep
    track of the level of nested conditionals; the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens
    in the group. Only the first group whose control condition evaluates to
    true (nonzero) is processed. If none of the conditions evaluates to
    true, and there is a #else directive, the group controlled by the #else
    is processed; lacking a #else directive, all the groups until the #endif
    are skipped.)

    Thus it is clear that an #include statement within a skipped block is
    not processed and thus #else and the like within the include file are
    not seen.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Richard Damon on Sat Dec 25 11:09:31 2021
    On 12/25/21 8:23 AM, Richard Damon wrote:
    On 12/25/21 1:18 AM, James Kuyper wrote:
    ...
    if.c:
    #if 1
    int i = 0;
    #include "else.h"
    int l = 3;
    #endif

    else.h:
    int j = 1;
    #else
    int k = 2;
    ...
    While the grammer may not be clear as to what happens between the start
    and end of the if-group, the description of what happens in the block
    says (6.10.1p6)

    Each directive’s condition is checked in order. If it evaluates to false (zero), the group that it controls is skipped: directives are processed
    only through the name that determines the directive in order to keep
    track of the level of nested conditionals; the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens
    in the group. Only the first group whose control condition evaluates to
    true (nonzero) is processed. If none of the conditions evaluates to
    true, and there is a #else directive, the group controlled by the #else
    is processed; lacking a #else directive, all the groups until the #endif
    are skipped.)

    Thus it is clear that an #include statement within a skipped block is
    not processed and thus #else and the like within the include file are
    not seen.

    <pedantic> The standard provides its own definitions for both "block"
    and "group". Of the two, the one that is relevant here is "group", not
    "block". </pedantic>

    I thought I had made it clear that I was very specifically concerned
    about #include directives that occur within groups that are NOT skipped. However, when I reviewed my message while preparing this response, I see
    that I failed to make that distinction - sorry! I didn't say anything incorrect, and what I said has unexpected consequences for such a
    directive, but I didn't emphasize that point. The only clue you had was
    the fact that my example involved such a directive.

    gcc ignored the #else directive, identifying it as a syntax error due to
    having no corresponding #if. Therefore it apparently does not consider
    the #if directive in if.c to be a match.

    It's interesting to note that a corresponding issue cannot come up
    through macro substitution. There are two reasons for this:

    "If a # preprocessing token, followed by an identifier, occurs lexically
    at the point at which a preprocessing directive could begin, the
    identifier is not subject to macro replacement." (6.10.3p8).

    and

    "The resulting completely macro-replaced preprocessing token sequence is
    not processed as a preprocessing directive even if it resembles one,
    ..." (6.10.3.4p3).

    I couldn't find comparable wording to interfere with providing an #if in
    one file, and the matching #else or #endif in a different file, by use
    of #include directives - but gcc doesn't accept it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to James Kuyper on Sat Dec 25 08:52:04 2021
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    This discussion requires familiarity with the standard's specifications
    and terminology from sections 6.10, 6.10.1 and 6.10.2. Unless you've got those sections memorized, you might need to cross-reference them to understand what I'm saying.

    To simplify the following discussion, I'm going to write it as if the
    only conditional inclusion preprocessing directives were #if, #else, and #endif. Code using the other conditional inclusion directives can always
    be rewritten to use only those three, with essentially the same
    behavior, with a minor exception in the case of #elsif, where the
    subsequent occurrences of __LINE__ would have increased values. Those
    other directives don't change the issue I'm discussing, they only
    complicate the discussion.

    I've long understood that, during translation phase 4, as soon as a
    compiler reaches the new-line at the end of a #if directive, it knows
    whether the #if group will be included. It not, and there's a
    corresponding #else, it knows that the #else group will be included.
    Either way, as soon as it starts reading a group that will be included,
    it can immediately start preprocessing that group (and this is the
    important part:) while searching for the #else or #endif directive that terminates the group.

    I've also long understood that the #if, #else (if any) and #endif
    directives that make up an if-section must all occur in the same file.
    I'm not sure how I reached that conclusion - it's not anything that the standard says explicitly. [...]

    The first rule of grammar in 6.10 paragraph 1 says (with \sub()
    to mean subscript)

    preprocessing-file:
    group \sub(opt)

    Thus each preprocessing file must consist of an integral number
    of group-part, and so cannot contain any unbalanced #if/#endif
    directives, or any #else directive outside an #if/#endif section.

    Note to all: Merry Christmas!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Tim Rentsch on Sun Dec 26 09:56:10 2021
    On Saturday, December 25, 2021 at 11:52:06 AM UTC-5, Tim Rentsch wrote:
    James Kuyper <james...@alumni.caltech.edu> writes:
    ...
    I've long understood that, during translation phase 4, as soon as a compiler reaches the new-line at the end of a #if directive, it knows whether the #if group will be included. It not, and there's a
    corresponding #else, it knows that the #else group will be included.
    Either way, as soon as it starts reading a group that will be included,
    it can immediately start preprocessing that group (and this is the important part:) while searching for the #else or #endif directive that terminates the group.

    I've also long understood that the #if, #else (if any) and #endif directives that make up an if-section must all occur in the same file.
    I'm not sure how I reached that conclusion - it's not anything that the standard says explicitly. [...]

    The first rule of grammar in 6.10 paragraph 1 says (with \sub()
    to mean subscript)

    preprocessing-file:
    group \sub(opt)

    Thus each preprocessing file must consist of an integral number
    of group-part, and so cannot contain any unbalanced #if/#endif
    directives, or any #else directive outside an #if/#endif section.

    I believe that what you're saying, using the terms defined in the C preprocessing grammar, is that neither an if-group, an else-group,
    nor a endif-line qualifies separately as a group-part, only a complete if-section can do so.

    When the standard defines the meaning of a term, that definition takes precedence over any other interpretation you might reach by analyzing
    the meaning of the words making up that term. "preprocessing-file" is
    simply a symbol in the grammar - it's definition is the grammar rule
    associated with that symbol.

    I've always interpreted the specification given in 6.10.2 as meaning that
    a given preprocessing file must match the grammar described in 6.10 up
    until the point that it recognizes a #include directive, which 'causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters.' It's only the file after that replacement (and all other such replacements), which must
    fully parse in accordance with the grammar in 6.10.

    However, the term "preprocessing file" is also defined in 5.1.1.1p1. That's
    a section of the standard that seldom comes up in discussion, so I'd
    forgotten about that definition. I agree that it makes sense that a "preprocessing file" is meant to match the syntax specified for a "preprocessing-file". The standard often uses a grammar symbol name,
    with '-' replaced by spaces, to refer to things matching that grammar
    symbol. However, this is one of the few places where the name, with that replacement, is formally defined separately from the grammar, implying a connection between those two definitions.

    This is not the clearest way to impose such a requirement. If each preprocessing file is supposed to separately parse as a preprocessing-file,
    I think it would have been better to explicitly mention that fact in the description of 6.10.2 "Source file Inclusion." The "replacement" wording actually used gave me the strong impression that there were no content restrictions on the #included file itself, but only on the result after replacing the directive with those contents.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to James Kuyper on Mon Jan 17 06:47:39 2022
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    On Saturday, December 25, 2021 at 11:52:06 AM UTC-5, Tim Rentsch wrote:

    James Kuyper <james...@alumni.caltech.edu> writes:

    ...

    I've long understood that, during translation phase 4, as soon as
    a compiler reaches the new-line at the end of a #if directive, it
    knows whether the #if group will be included. It not, and there's
    a corresponding #else, it knows that the #else group will be
    included. Either way, as soon as it starts reading a group that
    will be included, it can immediately start preprocessing that
    group (and this is the important part:) while searching for the
    #else or #endif directive that terminates the group.

    I've also long understood that the #if, #else (if any) and #endif
    directives that make up an if-section must all occur in the same
    file. I'm not sure how I reached that conclusion - it's not
    anything that the standard says explicitly. [...]

    The first rule of grammar in 6.10 paragraph 1 says (with \sub()
    to mean subscript)

    preprocessing-file:
    group \sub(opt)

    Thus each preprocessing file must consist of an integral number
    of group-part, and so cannot contain any unbalanced #if/#endif
    directives, or any #else directive outside an #if/#endif section.

    I believe that what you're saying, using the terms defined in the C preprocessing grammar, is that neither an if-group, an else-group,
    nor a endif-line qualifies separately as a group-part, only a
    complete if-section can do so.

    That isn't what I was saying, although I expect it is a true
    statement. What I was saying is that all the expansions of
    group-part are balanced with respect to #if/#endif directives,
    and also limit #else directives to be inside #if/#endif
    segments.

    When the standard defines the meaning of a term, that definition
    takes precedence over any other interpretation you might reach by
    analyzing the meaning of the words making up that term.
    "preprocessing-file" is simply a symbol in the grammar - it's
    definition is the grammar rule associated with that symbol.

    The possesive form of "it" is "its", with no apostrophe. The
    word "it's" is a contraction for "it is".

    It may help to remember that the same rule applies to all
    personal pronouns: an apostrophe always indicates a
    contraction with "am", "is" or "are" -

    I am, we are - I'm, we're
    you are - you're
    he is, she is, it is, they are - he's, she's, it's, they're

    and there is never an apostrophe in the possessive form of a
    personal pronoun -

    my, our
    your
    his, hers, its, their


    I've always interpreted the specification given in 6.10.2 as meaning
    that a given preprocessing file must match the grammar described in
    6.10 up until the point that it recognizes a #include directive,
    which 'causes the replacement of that directive by the entire
    contents of the source file identified by the specified sequence
    between the " delimiters.' It's only the file after that replacement
    (and all other such replacements), which must fully parse in
    accordance with the grammar in 6.10.

    However, the term "preprocessing file" is also defined in 5.1.1.1p1.
    That's a section of the standard that seldom comes up in discussion,
    so I'd forgotten about that definition. I agree that it makes sense
    that a "preprocessing file" is meant to match the syntax specified
    for a "preprocessing-file". The standard often uses a grammar
    symbol name, with '-' replaced by spaces, to refer to things
    matching that grammar symbol. However, this is one of the few
    places where the name, with that replacement, is formally defined
    separately from the grammar, implying a connection between those two definitions.

    My reading of the C standard is that "preprocessing file",
    "source file", and "preprocessing-file" all mean the same thing.
    A #include directive (in the "" form) "shall identify a [...]
    source file", and thus the content of the file being #include'd
    must match the syntax of "preprocessing-file".

    This is not the clearest way to impose such a requirement. If
    each preprocessing file is supposed to separately parse as a preprocessing-file, I think it would have been better to
    explicitly mention that fact in the description of 6.10.2
    "Source file Inclusion." The "replacement" wording actually
    used gave me the strong impression that there were no content
    restrictions on the #included file itself, but only on the
    result after replacing the directive with those contents.

    It seems other people don't have any problem understanding what
    the requirements are here. Do you have any ideas about why you
    do when others don't?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)