• Compiler bootstrapping and the standard header files

    From codevisio@gmail.com@21:1/5 to All on Thu Mar 19 05:40:47 2020
    Hi,

    newbie here.

    I've been going through come compiler sources available online and study
    them, in particular C compilers.

    Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my
    own standard headers I created for my compiler.
    However, this does not seem to be case while I look at the makefile of those online compiler sources.
    Am I wrong? Some explanation about that it is much appreciated.

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to All on Fri Mar 20 01:34:59 2020
    Am 19.03.2020 um 13:40 schrieb codevisio@gmail.com:

    I've been going through come compiler sources available online and study them, in particular C compilers.

    Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

    For every program you need the standard libraries and headers for the
    *target* system, for which you compile and link your program. The
    headers are required at compile time, the libraries at link time. Both
    come with the development system that you use to build your program.

    Now you have everything to build your own compiler, for whatever
    language, headers and library files you have it designed. Your compiler
    has incorporated the libraries of the original development system,
    what's okay for the compiler itself, but you are not normally allowed to
    ship those original libaries with your new compiler, for building new
    programs. So you have to create and build also your own headers and
    libraries, using the original development system because your new
    compiler is still useless without its own libraries.

    Having done that you can use your compiler to build programs. First some
    simple tests, like Hello World, but finally you may want your compiler
    to compile itself as a proof of fitness. In former times gcc was built recursively, i.e. every new compiler compiled its own source code again,
    until two versions produced the same binaries. Dunno how the procedure
    looks nowadays.

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Gollwitzer@21:1/5 to All on Fri Mar 20 10:21:11 2020
    Hi codevisio,

    Am 19.03.20 um 13:40 schrieb codevisio@gmail.com:
    I've been going through come compiler sources available online and study them, in particular C compilers.

    Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

    No, such a rule does not exist and it would not make sense, either. The
    easiest way to see this is by considering a cross-compiler.

    Assume that your new compiler runs on host A (e.g. x86 PC) and compiles
    a program for host B (e.g. ARM Raspberry Pi). Then for running the
    compiler on the PC, you need the headers for the Raspi on the PC. But
    the resulting binary does not run there.

    The compiler itself must have been compiled by a compiler on the PC -
    because if you compile it with itself, it would not run on th PC.
    Therefore, to compile the compiler, you need the host compiler of the PC
    and the headers and libraries from there.

    Actually, a "cross compiler" is the regular case, and a "non-cross
    compiler" is a special case, albeit many developers only use one machine
    for compilation and testing, and therefore use a "non-cross compiler"
    where the target is the same as the host.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Fri Mar 20 06:21:56 2020
    Dodi gets the answer basically right. I'm going to say something
    similar in slightly different words. The good news is that you are
    doing this for C which was designed to be a relatively simple language
    to port, although you may even want to use a restricted dialect of C
    to make it even simpler. The simpler the dialect, the less runtime
    library you need (to get the bootstrap working).

    First, there are 3 interacting parts. They are all interconnected,
    but still separate.

    The compiler itself
    The header files
    The supporting runtime library

    There are also two bits of terminology you need to learn from cross-compiling.

    host
    target

    So, the machine you are compiling on and the compiler you are
    compiling with are considered the host. The machine that the program
    will run on and the runtime library that support it are considered the
    target. There are diagrams that illustrate this. They are called
    T-diagrams. Here is an ASCII rendition (excuse my drawing skills).

    + ------------------------- +
    | host headers target |
    + --- + + ----- + ------------------- +
    | compiler | host headers target |
    + ----------- + ---- + + ----- +
    | compiler |
    + ----------- +

    Where the target in the first T is the host in the second T.
    Everything else can be different. You can nest this diagram as many
    times as one likes. The typical bootstrapping process nests 3 Ts. I
    will explain why later.

    From that diagram you can see that the headers must match both the
    host (compiler) and the target (runtime).

    Let's now illustrate that with a couple of different scenarios.

    The first simplest scenario is you want to run the resulting program
    in the same environment (same target machine, same target runtime
    library) as the host environment. This is the way you bootstrap a new
    version of the compiler using the same runtime library. This "new
    version" might be this new compiler you are building from scratch.

    So, you take the program you want to compile (this will be the new
    version of the compiler), And plug it into the host box of the first
    T. The host compiler takes this program and the header files which
    match that compiler and target runtime library and compiles it to a
    target program that uses the target runtime library. You now have a
    new executable program (after linking) the you can run on the target
    machine. This new executable program, just happens to be your new
    [version of the] compiler. So, now you can take the source code of
    the program again and compile it with the new compiler (using the
    header which match that new compiler and target runtime library) and
    compile it again. If you repeat this process twice (that is 3 T
    boxes), the code generated should be roughly the same. There may be
    timestamps or similar artifacts that differ, that you have to filter
    out, but otherwise any differences are bugs in the compiler.

    The scenario gets a bit more complicated if you are building a
    cross-compiler (targeting a different machine than the host machine,
    or even just a different (and incompatible) runtime library on the
    host machine. In that case, your first host and target are the same
    machine, but your second target is a different machine (different
    runtime library). You may or may not be able to build a native (host)
    compiler on that second machine. It is quite common for embedded
    machines to lack all the facilities you need (e.g. file systems) to
    run a compiler on them. You don't need a compiler to run on the chip
    that runs your car engine or toaster. You just need a compiler that
    can generate code for that chip. However, if you are building a
    native compiler for that chip, then you need the 3 step T diagram.

    Hopefully, you can figure out from this, that:

    When compiling your compiler with some other compiler, you use the
    header files from that compiler (and that go with that runtime
    routine). You will note that cross-compilers (e.g. compilers that run
    on an x86 but compile code for an arm machine) may use different
    header files than the compiler from the same vendor that target the
    host machine. The header files must match both the compiler and the
    target runtime and target runtimes for different machines (even for
    the "same" compiler) can differ due to linker and OS dependencies.
    When compiling your compile with your own compiler, you must use the
    header files for your compiler. You may even have two different
    copies, if you are developing your own runtime library. One the
    matches the original compiler's runtime library, so you can use that
    and one the matches your own runtime library so you have something to
    "ship".

    -------

    Finally, I am going to illustrate this process with one of the first
    compiler's I worked on. In 1978 I worked at Softech and we had a
    contract to build a cross-compiler from Multics to the Interdata 8/32
    for the Jovial language (a new dialect called J73/C). We wanted our
    compiler to be written in Jovial, but there was no Jovial compiler on
    the Honeywell Multics machines. So, Carl Martin, my mentor at that
    time, wrote a translator (effectively a macro package) that translated
    a subset of Jovial into PL/I, with PL/l semantics, so you could only
    use a subset of Jovial where the semantics of it and PL/I aligned.
    But, that was ok, because you don't [shouldn't] need a lot of
    sophisticated semantics to write a compiler.

    So, then we wrote our first compiler in that subset. We then ran it
    through the translator (1st T diagram) and got out an equivalent PL/I
    program which we could compile with the Multics PL/I compiler. Then
    we ran that compiler through the Multics PL/I compiler (2nd T diagram)
    and got out a native Multics executable. Now, we had a program that
    you could run on the Mutics machine that would compile Jovial (and
    output Interdata 8/32 code--3rd T diagram). We also had a version
    that generated code for the Multics machine. I don't know if we ever
    built a native Jovial compiler on the Interdata machine (that would
    have been a 4th T diagram). In theory we could have, but the compiler
    was targeting embedded applications, so I don't know what OS support
    there was.

    ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From cvo@21:1/5 to All on Fri Mar 20 15:52:02 2020
    Hi Hans-Peter, Christian and Christopher.

    Thanks for all of your answer.

    I reply here but my answer is for you all.


    I'm not going to consider the cross-compiler case now.
    My intention was to stick with the simplest scenario.

    (
    Premise: I'd use the term 'libraries' instead of 'runtime libraries'
    since the majority, I think, of C standard function definitions are
    implemented without OS API calls. At the same time there are two
    libraries concepts, static and dynamic libraries. So in theory I
    could link together the static libraries definitions inside my executable.
    At least from the Windows world, but I guess the *nix also has those.
    )


    If I right understood you, the following are the steps.

    1) Host compiler + host compiler headers + host compiler libraries
    are used to compile my first version of my new compiler getting:
    mygcc0 + mygcc0 C libraries.

    2) Then I use mygcc0 + mygcc0 C headers + mygcc0 C libraries
    to compile my second version of my new compiler getting: mygcc1 +
    mygcc1 C libraries.

    3) Eventually I use mygcc1 + mygcc1 C headers (== mygcc0 C headers) +
    mygcc1 C libraries to compile my third version of my new compiler
    getting: mygcc2 + mygcc2 C libraries.

    Q1) Is that correct?
    If so, then point (1) above is a starting point. Point (2) is the
    first necessary step. While mygcc0 has code generated by the host
    compiler using its headers and its library functions, mygcc1 and
    mgcc1 C libraries instead have been generated by my new compiler,
    mygcc0.
    Q1.2) Is that correct?
    If so then also point (3) is a necessary step, because while it
    is true that mygcc1 has been generated by mygcc0 (without any
    host compilers), it is also true that I need a new version of my
    compiler to compare mygcc1 with. That is mygcc2.
    Q1.3) Is that correct?
    If so the the final and my first question is:
    Given a gcc in a linux environment with its C headers and
    with its C libraries, and given my compiler with its C headers
    and with its C libraries, as long as my C headers and C libraries
    conform semantically to the C standard, why cannot I use them
    in the first step, point (1), in place of host ones?

    Why do I ask that? Because I can start debugging and fixing my
    compiler along with my C libraries (and C headers) since the
    beginning instead of debugging my C libraries at step 2. In other
    words, it is true I can compile my C libraries at step 1, but I
    cannot debug their implementation yet because at step 1 I'm using
    the C libraries (and headers) from the host environment. I can
    debug my C libraries implementation only at step 2.


    Thanks all.

    [To your Q1.3, there is a lot of wiggle room in the C standard,
    particularly in internal implementation details. For example, in
    stdio.h there is a definition of FILE as a typedef of an opaque
    structure. There is no reason that your C library's FILE structure
    would be the same as gcc's. I hope you're not planning to write your
    own C library, since there are good quality open source ones you can
    adapt. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@u.washington.edu@21:1/5 to cvo on Sat Mar 21 15:32:56 2020
    On Thursday, March 19, 2020 at 2:45:16 PM UTC-7, cvo wrote:

    I've been going through come compiler sources available online and study them, in particular C compilers.

    One reason for the inability to use such is copyright.
    But for now, I will assume that isn't the question.

    Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

    As well as I remember the early days of gcc, though I wasn't so interested
    in it at the time, it was a replacement for an existing compiler,
    such as the one for SunOS.

    For some years, Sun had only one C compiler, distributed with SunOS,
    but at some point they licensed it separately. The compiler that came
    with SunOS was only meant to be enough to compile the kernel modules
    needed for sysgen. (But I believe gcc started before this.)

    In any case, at that time they used the existing SunOS (and maybe
    others, but I was mostly using Sun at the time) library. Much of the
    C header files have data structures (C struct) that match with
    library routines.

    I believe that gcc was trying to build a better optimizing
    compiler, and to get people to use it over the SunOS compiler.
    (Personal opinion, so anyone can disagree if they want to.)

    It was only sometime later that GNU wrote their own glibc as a
    replacement for the C library. Some data structures match system
    calls, and so are system dependent, where others are only need
    to be library internally consistent.

    As for cross compilers, much of the data structures are consistent
    across different architectures for the same system version, such
    as 68020 SunOS and Sparc SunOS, such that the header files would
    be the same, or almost the same. (As well as I remember, Sun
    keeps the system dependent ones separate, with a symbolic link
    to the actual version.)

    Writing the compiler first, using the existing library,
    conveniently allows one to get something working earlier.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to All on Sun Mar 22 04:22:18 2020
    Am 20.03.2020 um 23:52 schrieb cvo:

    Why do I ask that? Because I can start debugging and fixing my
    compiler along with my C libraries (and C headers) since the
    beginning instead of debugging my C libraries at step 2. In other
    words, it is true I can compile my C libraries at step 1, but I
    cannot debug their implementation yet because at step 1 I'm using
    the C libraries (and headers) from the host environment. I can
    debug my C libraries implementation only at step 2.

    I don't understand what is "I" in "I'm using..." here. You can tell any *compiler* to use your headers in compiling a test program, and to link
    it using your libraries. Then you can debug your libraries as part of
    that test program.

    Where (static runtime) "libraries" is nothing but a collection of
    previously compiled object files, which deserve no further compilation
    with any new program. The linker puts together all required (listed)
    object files and replaces all their references to external symbols by
    the symbol definition (address) found in one of these files.

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Sun Mar 22 13:51:55 2020
    cvo asked:

    Your assumptions up to this one seemed correct to me.

    If so the the final and my first question is:
    Given a gcc in a linux environment with its C headers and
    with its C libraries, and given my compiler with its C headers
    and with its C libraries, as long as my C headers and C libraries
    conform semantically to the C standard, why cannot I use them
    in the first step, point (1), in place of host ones?

    If you header and libraries are written in standard C, and the
    original compiler works properly when you use them with it. You can
    use your headers and libraries in step 1. It won't change the number
    of steps, but that's a different question, which you didn't ask.

    But listen carefully, to John, our moderators, comments and consider
    not writing your own library when writing your own compiler. I can
    probably count on one hand with fingers left over the number of times
    he's been wrong in the years I've been reading here. More importantly.
    Don't make unnecessary work for yourself. Writing a compiler is
    plenty to do. Adding more work on that pile, just means you won't do
    as good a job on it as you would if you had it as your only task.

    The reason you might need the host headers and libraries (supplied
    with the compiler) is that they don't need to be written in standard
    C. In theory (and even in practice), the standard header files (nor
    the library) don't even need to exist. The compiler can use internal
    versions of them when it sees the "include" preprocessor directives
    and the internal versions don't even need to be C code, they just need
    to make your C programs compile with the compiler (and run when
    executed, so the runtime library can also be built into the code
    generated by the compiler and not a separate library at all).

    And, yes, I think I've seen compilers that do that. Before C, it was
    common, since there were no "header files" just functions that the language/compiler defined. We called them "built ins". For hardware
    that had things line sin and cos instructions, you didn't have sin and
    cos functions. On the other hand, if the hardware didn't have a
    "double precision" divide instruction (which many didn't), you called
    a runtime library routine for that, even though the user didn't write
    it as a call, just as an expression.

    In that way C was revolutionary, you were able to write large parts of
    the C runtime in C. Before C, the runtime libraries were almost
    always written in assembly language and didn't necessarily follow the
    standard function call conventions. Burroughs (5000 series) Algol
    might have been the exception as I have heard there wasn't even an
    assembly language for it. Everything was written in Algol.

    So if you can write the functions you need in standard C and you can
    make all the parts work with header files written in standard C, yes
    you can probably use your header files. I wonder how you are going to
    get things like varargs and setjmp to work. That usually involves a
    bit of compiler magic. That's why names starting with two underscores
    are reserved, so that the compiler can use them to describe the
    necessary magic.

    However, I will admit that my knowledge on this topic is more than a
    bit rusty. The last time I implemented a C compiler was at Honeywell,
    when C was just being standardized and C++ was still new.

    Just sharing my ignorance (i.e. what passes for knowledge in my mind)
    and thoughts,
    Chris

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [For an example of library incompatibility problems, look at your
    local version of setjmp.h and the source of setjmp() and longjmp().
    On most systems the header file is written in standard C, but it's
    also totally machine dependent because the size and contents of a
    jmpbuf depends on the machine architecture and calling sequence.

    For the machine independent parts of libc (most of it), take a look at https://musl.libc.org/, an open source C library with a flexible MIT
    license. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Christopher F Clark on Mon Mar 23 13:49:38 2020
    On 2020-03-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
    cvo asked:

    Your assumptions up to this one seemed correct to me.

    If so the the final and my first question is:
    Given a gcc in a linux environment with its C headers and
    with its C libraries, and given my compiler with its C headers
    and with its C libraries, as long as my C headers and C libraries
    conform semantically to the C standard, why cannot I use them
    in the first step, point (1), in place of host ones?

    If you header and libraries are written in standard C, and the
    original compiler works properly when you use them with it.

    Your replacement C library, if it is to be actually stand-alone (not
    depending on system libraries) can't be written entirely in standard C.
    For example, I/O requires system calls, which use bits of assembly code,
    inline or otherwise.

    If that library has to be boostrapped with the host compiler, it may
    have to rely on /its/ extensions (like inline assembler). If those
    kinds of extensions look different in your own compiler, and you want
    to be able to rebuild the library with it, /that/ code will have to be
    written twice just due to the compiler differences, never mind all
    the times it has to be written for different machines.

    You're making a lot of extra work for yourself, including future
    portability chores, if you make your own library.

    But listen carefully, to John, our moderators, comments and consider
    not writing your own library when writing your own compiler. I can
    probably count on one hand with fingers left over the number of times
    he's been wrong in the years I've been reading here. More importantly.
    Don't make unnecessary work for yourself. Writing a compiler is
    plenty to do. Adding more work on that pile, just means you won't do
    as good a job on it as you would if you had it as your only task.

    A C library is not only a language library; it's platform abstraction
    layer. Platform vendors provide these. If you want a compiler that can
    be ported to many platforms, you can't be replicating the work of all
    those vendors; it is not tractable.

    A compiler may have to have a special run-time library to implement
    certain things that have to do with the language itself, like any math operators that can't be implemented in-line.

    A compiler could ship with some interesting library that the compiler
    vendor promotes. Like say that your compiler provides some interesting
    features like dynamic compilation. Of course, you have to provide your
    own API to it which accompanies the compiler.

    Just don't go re-implementing POSIX and ISO C, I think.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)