Forum: >>> Magnum BBS <<<

Compiler bootstrapping and the standard header files

From codevisio@gmail.com@21:1/5 to All on Thu Mar 19 05:40:47 2020

Hi,

newbie here.

I've been going through come compiler sources available online and study
them, in particular C compilers.

Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my
own standard headers I created for my compiler.
However, this does not seem to be case while I look at the makefile of those online compiler sources.
Am I wrong? Some explanation about that it is much appreciated.

Thanks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans-Peter Diettrich@21:1/5 to All on Fri Mar 20 01:34:59 2020

Am 19.03.2020 um 13:40 schrieb codevisio@gmail.com:

I've been going through come compiler sources available online and study them, in particular C compilers.

Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

For every program you need the standard libraries and headers for the
*target* system, for which you compile and link your program. The
headers are required at compile time, the libraries at link time. Both
come with the development system that you use to build your program.

Now you have everything to build your own compiler, for whatever
language, headers and library files you have it designed. Your compiler
has incorporated the libraries of the original development system,
what's okay for the compiler itself, but you are not normally allowed to
ship those original libaries with your new compiler, for building new
programs. So you have to create and build also your own headers and
libraries, using the original development system because your new
compiler is still useless without its own libraries.

Having done that you can use your compiler to build programs. First some
simple tests, like Hello World, but finally you may want your compiler
to compile itself as a proof of fitness. In former times gcc was built recursively, i.e. every new compiler compiled its own source code again,
until two versions produced the same binaries. Dunno how the procedure
looks nowadays.

DoDi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christian Gollwitzer@21:1/5 to All on Fri Mar 20 10:21:11 2020

Hi codevisio,

Am 19.03.20 um 13:40 schrieb codevisio@gmail.com:

I've been going through come compiler sources available online and study them, in particular C compilers.

Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

No, such a rule does not exist and it would not make sense, either. The
easiest way to see this is by considering a cross-compiler.

Assume that your new compiler runs on host A (e.g. x86 PC) and compiles
a program for host B (e.g. ARM Raspberry Pi). Then for running the
compiler on the PC, you need the headers for the Raspi on the PC. But
the resulting binary does not run there.

The compiler itself must have been compiled by a compiler on the PC -
because if you compile it with itself, it would not run on th PC.
Therefore, to compile the compiler, you need the host compiler of the PC
and the headers and libraries from there.

Actually, a "cross compiler" is the regular case, and a "non-cross
compiler" is a special case, albeit many developers only use one machine
for compilation and testing, and therefore use a "non-cross compiler"
where the target is the same as the host.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Fri Mar 20 06:21:56 2020

Dodi gets the answer basically right. I'm going to say something
similar in slightly different words. The good news is that you are
doing this for C which was designed to be a relatively simple language
to port, although you may even want to use a restricted dialect of C
to make it even simpler. The simpler the dialect, the less runtime
library you need (to get the bootstrap working).

First, there are 3 interacting parts. They are all interconnected,
but still separate.

The compiler itself
The header files
The supporting runtime library

There are also two bits of terminology you need to learn from cross-compiling.

host
target

So, the machine you are compiling on and the compiler you are
compiling with are considered the host. The machine that the program
will run on and the runtime library that support it are considered the
target. There are diagrams that illustrate this. They are called
T-diagrams. Here is an ASCII rendition (excuse my drawing skills).

+ ------------------------- +
| host headers target |
+ --- + + ----- + ------------------- +
| compiler | host headers target |
+ ----------- + ---- + + ----- +
| compiler |
+ ----------- +

Where the target in the first T is the host in the second T.
Everything else can be different. You can nest this diagram as many
times as one likes. The typical bootstrapping process nests 3 Ts. I
will explain why later.

From that diagram you can see that the headers must match both the
host (compiler) and the target (runtime).

Let's now illustrate that with a couple of different scenarios.

The first simplest scenario is you want to run the resulting program
in the same environment (same target machine, same target runtime
library) as the host environment. This is the way you bootstrap a new
version of the compiler using the same runtime library. This "new
version" might be this new compiler you are building from scratch.

So, you take the program you want to compile (this will be the new
version of the compiler), And plug it into the host box of the first
T. The host compiler takes this program and the header files which
match that compiler and target runtime library and compiles it to a
target program that uses the target runtime library. You now have a
new executable program (after linking) the you can run on the target
machine. This new executable program, just happens to be your new
[version of the] compiler. So, now you can take the source code of
the program again and compile it with the new compiler (using the
header which match that new compiler and target runtime library) and
compile it again. If you repeat this process twice (that is 3 T
boxes), the code generated should be roughly the same. There may be
timestamps or similar artifacts that differ, that you have to filter
out, but otherwise any differences are bugs in the compiler.

The scenario gets a bit more complicated if you are building a
cross-compiler (targeting a different machine than the host machine,
or even just a different (and incompatible) runtime library on the
host machine. In that case, your first host and target are the same
machine, but your second target is a different machine (different
runtime library). You may or may not be able to build a native (host)
compiler on that second machine. It is quite common for embedded
machines to lack all the facilities you need (e.g. file systems) to
run a compiler on them. You don't need a compiler to run on the chip
that runs your car engine or toaster. You just need a compiler that
can generate code for that chip. However, if you are building a
native compiler for that chip, then you need the 3 step T diagram.

Hopefully, you can figure out from this, that:

When compiling your compiler with some other compiler, you use the
header files from that compiler (and that go with that runtime
routine). You will note that cross-compilers (e.g. compilers that run
on an x86 but compile code for an arm machine) may use different
header files than the compiler from the same vendor that target the
host machine. The header files must match both the compiler and the
target runtime and target runtimes for different machines (even for
the "same" compiler) can differ due to linker and OS dependencies.
When compiling your compile with your own compiler, you must use the
header files for your compiler. You may even have two different
copies, if you are developing your own runtime library. One the
matches the original compiler's runtime library, so you can use that
and one the matches your own runtime library so you have something to
"ship".

-------

Finally, I am going to illustrate this process with one of the first
compiler's I worked on. In 1978 I worked at Softech and we had a
contract to build a cross-compiler from Multics to the Interdata 8/32
for the Jovial language (a new dialect called J73/C). We wanted our
compiler to be written in Jovial, but there was no Jovial compiler on
the Honeywell Multics machines. So, Carl Martin, my mentor at that
time, wrote a translator (effectively a macro package) that translated
a subset of Jovial into PL/I, with PL/l semantics, so you could only
use a subset of Jovial where the semantics of it and PL/I aligned.
But, that was ok, because you don't [shouldn't] need a lot of
sophisticated semantics to write a compiler.

So, then we wrote our first compiler in that subset. We then ran it
through the translator (1st T diagram) and got out an equivalent PL/I
program which we could compile with the Multics PL/I compiler. Then
we ran that compiler through the Multics PL/I compiler (2nd T diagram)
and got out a native Multics executable. Now, we had a program that
you could run on the Mutics machine that would compile Jovial (and
output Interdata 8/32 code--3rd T diagram). We also had a version
that generated code for the Multics machine. I don't know if we ever
built a native Jovial compiler on the Interdata machine (that would
have been a 4th T diagram). In theory we could have, but the compiler
was targeting embedded applications, so I don't know what OS support
there was.

****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From cvo@21:1/5 to All on Fri Mar 20 15:52:02 2020

Hi Hans-Peter, Christian and Christopher.

Thanks for all of your answer.

I reply here but my answer is for you all.

I'm not going to consider the cross-compiler case now.
My intention was to stick with the simplest scenario.

(
Premise: I'd use the term 'libraries' instead of 'runtime libraries'
since the majority, I think, of C standard function definitions are
implemented without OS API calls. At the same time there are two
libraries concepts, static and dynamic libraries. So in theory I
could link together the static libraries definitions inside my executable.
At least from the Windows world, but I guess the *nix also has those.
)

If I right understood you, the following are the steps.

1) Host compiler + host compiler headers + host compiler libraries
are used to compile my first version of my new compiler getting:
mygcc0 + mygcc0 C libraries.

2) Then I use mygcc0 + mygcc0 C headers + mygcc0 C libraries
to compile my second version of my new compiler getting: mygcc1 +
mygcc1 C libraries.

3) Eventually I use mygcc1 + mygcc1 C headers (== mygcc0 C headers) +
mygcc1 C libraries to compile my third version of my new compiler
getting: mygcc2 + mygcc2 C libraries.

Q1) Is that correct?
If so, then point (1) above is a starting point. Point (2) is the
first necessary step. While mygcc0 has code generated by the host
compiler using its headers and its library functions, mygcc1 and
mgcc1 C libraries instead have been generated by my new compiler,
mygcc0.
Q1.2) Is that correct?
If so then also point (3) is a necessary step, because while it
is true that mygcc1 has been generated by mygcc0 (without any
host compilers), it is also true that I need a new version of my
compiler to compare mygcc1 with. That is mygcc2.
Q1.3) Is that correct?
If so the the final and my first question is:
Given a gcc in a linux environment with its C headers and
with its C libraries, and given my compiler with its C headers
and with its C libraries, as long as my C headers and C libraries
conform semantically to the C standard, why cannot I use them
in the first step, point (1), in place of host ones?

Why do I ask that? Because I can start debugging and fixing my
compiler along with my C libraries (and C headers) since the
beginning instead of debugging my C libraries at step 2. In other
words, it is true I can compile my C libraries at step 1, but I
cannot debug their implementation yet because at step 1 I'm using
the C libraries (and headers) from the host environment. I can
debug my C libraries implementation only at step 2.

Thanks all.

[To your Q1.3, there is a lot of wiggle room in the C standard,
particularly in internal implementation details. For example, in
stdio.h there is a definition of FILE as a typedef of an opaque
structure. There is no reason that your C library's FILE structure
would be the same as gcc's. I hope you're not planning to write your
own C library, since there are good quality open source ones you can
adapt. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@u.washington.edu@21:1/5 to cvo on Sat Mar 21 15:32:56 2020

On Thursday, March 19, 2020 at 2:45:16 PM UTC-7, cvo wrote:

I've been going through come compiler sources available online and study them, in particular C compilers.

One reason for the inability to use such is copyright.
But for now, I will assume that isn't the question.

Since all of them implement the C standards headers, my assumption was that during the development of the compiler I cannot use the the standard headers coming from the host environment & C compiler, but instead I have to use my own standard headers I created for my compiler.

As well as I remember the early days of gcc, though I wasn't so interested
in it at the time, it was a replacement for an existing compiler,
such as the one for SunOS.

For some years, Sun had only one C compiler, distributed with SunOS,
but at some point they licensed it separately. The compiler that came
with SunOS was only meant to be enough to compile the kernel modules
needed for sysgen. (But I believe gcc started before this.)

In any case, at that time they used the existing SunOS (and maybe
others, but I was mostly using Sun at the time) library. Much of the
C header files have data structures (C struct) that match with
library routines.

I believe that gcc was trying to build a better optimizing
compiler, and to get people to use it over the SunOS compiler.
(Personal opinion, so anyone can disagree if they want to.)

It was only sometime later that GNU wrote their own glibc as a
replacement for the C library. Some data structures match system
calls, and so are system dependent, where others are only need
to be library internally consistent.

As for cross compilers, much of the data structures are consistent
across different architectures for the same system version, such
as 68020 SunOS and Sparc SunOS, such that the header files would
be the same, or almost the same. (As well as I remember, Sun
keeps the system dependent ones separate, with a symbolic link
to the actual version.)

Writing the compiler first, using the existing library,
conveniently allows one to get something working earlier.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans-Peter Diettrich@21:1/5 to All on Sun Mar 22 04:22:18 2020

Am 20.03.2020 um 23:52 schrieb cvo:

Why do I ask that? Because I can start debugging and fixing my
compiler along with my C libraries (and C headers) since the
beginning instead of debugging my C libraries at step 2. In other
words, it is true I can compile my C libraries at step 1, but I
cannot debug their implementation yet because at step 1 I'm using
the C libraries (and headers) from the host environment. I can
debug my C libraries implementation only at step 2.

I don't understand what is "I" in "I'm using..." here. You can tell any *compiler* to use your headers in compiling a test program, and to link
it using your libraries. Then you can debug your libraries as part of
that test program.

Where (static runtime) "libraries" is nothing but a collection of
previously compiled object files, which deserve no further compilation
with any new program. The linker puts together all required (listed)
object files and replaces all their references to external symbols by
the symbol definition (address) found in one of these files.

DoDi

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Sun Mar 22 13:51:55 2020

cvo asked:

Your assumptions up to this one seemed correct to me.

If so the the final and my first question is:
Given a gcc in a linux environment with its C headers and
with its C libraries, and given my compiler with its C headers
and with its C libraries, as long as my C headers and C libraries
conform semantically to the C standard, why cannot I use them
in the first step, point (1), in place of host ones?

If you header and libraries are written in standard C, and the
original compiler works properly when you use them with it. You can
use your headers and libraries in step 1. It won't change the number
of steps, but that's a different question, which you didn't ask.

But listen carefully, to John, our moderators, comments and consider
not writing your own library when writing your own compiler. I can
probably count on one hand with fingers left over the number of times
he's been wrong in the years I've been reading here. More importantly.
Don't make unnecessary work for yourself. Writing a compiler is
plenty to do. Adding more work on that pile, just means you won't do
as good a job on it as you would if you had it as your only task.

The reason you might need the host headers and libraries (supplied
with the compiler) is that they don't need to be written in standard
C. In theory (and even in practice), the standard header files (nor
the library) don't even need to exist. The compiler can use internal
versions of them when it sees the "include" preprocessor directives
and the internal versions don't even need to be C code, they just need
to make your C programs compile with the compiler (and run when
executed, so the runtime library can also be built into the code
generated by the compiler and not a separate library at all).

And, yes, I think I've seen compilers that do that. Before C, it was
common, since there were no "header files" just functions that the language/compiler defined. We called them "built ins". For hardware
that had things line sin and cos instructions, you didn't have sin and
cos functions. On the other hand, if the hardware didn't have a
"double precision" divide instruction (which many didn't), you called
a runtime library routine for that, even though the user didn't write
it as a call, just as an expression.

In that way C was revolutionary, you were able to write large parts of
the C runtime in C. Before C, the runtime libraries were almost
always written in assembly language and didn't necessarily follow the
standard function call conventions. Burroughs (5000 series) Algol
might have been the exception as I have heard there wasn't even an
assembly language for it. Everything was written in Algol.

So if you can write the functions you need in standard C and you can
make all the parts work with header files written in standard C, yes
you can probably use your header files. I wonder how you are going to
get things like varargs and setjmp to work. That usually involves a
bit of compiler magic. That's why names starting with two underscores
are reserved, so that the compiler can use them to describe the
necessary magic.

However, I will admit that my knowledge on this topic is more than a
bit rusty. The last time I implemented a C compiler was at Honeywell,
when C was just being standardized and C++ was still new.

Just sharing my ignorance (i.e. what passes for knowledge in my mind)
and thoughts,
Chris

-- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [For an example of library incompatibility problems, look at your
local version of setjmp.h and the source of setjmp() and longjmp().
On most systems the header file is written in standard C, but it's
also totally machine dependent because the size and contents of a
jmpbuf depends on the machine architecture and calling sequence.

For the machine independent parts of libc (most of it), take a look at https://musl.libc.org/, an open source C library with a flexible MIT
license. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Christopher F Clark on Mon Mar 23 13:49:38 2020

On 2020-03-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:

cvo asked:

Your assumptions up to this one seemed correct to me.

If so the the final and my first question is:
Given a gcc in a linux environment with its C headers and
with its C libraries, and given my compiler with its C headers
and with its C libraries, as long as my C headers and C libraries
conform semantically to the C standard, why cannot I use them
in the first step, point (1), in place of host ones?

If you header and libraries are written in standard C, and the
original compiler works properly when you use them with it.

Your replacement C library, if it is to be actually stand-alone (not
depending on system libraries) can't be written entirely in standard C.
For example, I/O requires system calls, which use bits of assembly code,
inline or otherwise.

If that library has to be boostrapped with the host compiler, it may
have to rely on /its/ extensions (like inline assembler). If those
kinds of extensions look different in your own compiler, and you want
to be able to rebuild the library with it, /that/ code will have to be
written twice just due to the compiler differences, never mind all
the times it has to be written for different machines.

You're making a lot of extra work for yourself, including future
portability chores, if you make your own library.

But listen carefully, to John, our moderators, comments and consider
not writing your own library when writing your own compiler. I can
probably count on one hand with fingers left over the number of times
he's been wrong in the years I've been reading here. More importantly.
Don't make unnecessary work for yourself. Writing a compiler is
plenty to do. Adding more work on that pile, just means you won't do
as good a job on it as you would if you had it as your only task.

A C library is not only a language library; it's platform abstraction
layer. Platform vendors provide these. If you want a compiler that can
be ported to many platforms, you can't be replicating the work of all
those vendors; it is not tractable.

A compiler may have to have a special run-time library to implement
certain things that have to do with the language itself, like any math operators that can't be implemented in-line.

A compiler could ship with some interesting library that the compiler
vendor promotes. Like say that your compiler provides some interesting
features like dynamic compilation. Of course, you have to provide your
own API to it which accompanies the compiler.

Just don't go re-implementing POSIX and ISO C, I think.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	62:29:58
Calls:	6,654
Files:	12,200
Messages:	5,331,627

Compiler bootstrapping and the standard header files

Who's Online

System Info