The Edison language and "personal software system" (1982)
From Lasse =?iso-8859-1?q?Hiller=F8e?= P@21:1/5 to All on Mon Nov 15 12:37:38 2021
One book that has inspired me more than any other books on the topic of programming languages, is _Programming A Personal Computer_ by Per Brinch- Hansen. This book is a gem. In 400 pages, Brinch-Hansen
- documents a complete OS for the PDP-11 (also ported to the IBM PC).
- documents the language Edison, in which the OS is written.
- discusses all phases of the compiler at length, with full source code.
- provides an extensive and very convincing argument for his style of
language design, and the particular choices taken in designing Edison.
- provides a compelling guide to _writing_ such a documentation, with
much good advice from Peter Naur.
I read excerpts of his earlier work on Concurrent Pascal for my second
year (fourth semester) course on concurrent programming, in 1988. And
years before that, his book _On Pascal Compilers_ was the second book I
read about writing compilers (the first was Wirth's _Algorithms+Data Structures=Programs_, which I read when in 9. grade in 1983-84.)
Edison as a language is strongly guided by a number of principles. One is
to design a language by taking a good existing one, and remove as much as possible, then add what you need extra. Edison is based on Pascal, like
so many other languages, but it differs quite a lot (from original
- It has modules.
- It uses bracketed control syntax (Algol68/Modula-2/Ada/...) instead of
begin ... end compound statements. (This fixes Pascal's "dangling else" problem.)
- It has the basic scalar types of Pascal: bool(ean), char, int(eger) and Enumerated (being a systems language, Reals have been omitted),
composable using sets, records and arrays.
- It has statements for concurrent execution and for critical regions.
- It simplifies type equivalency (a confusing part of Pascal) by
excluding integer ranges and requiring all types to be named, resulting
in types being equivalent when having the same name and scope.
- Pointers are deemed unnecessary and omitted! There is no dynamic memory allocation!
- Same goes for unions (he achieves the same with an enumerated field in
a record indicating the actual type, and typecasting record variables),
case statement and for loops.
- Functions can return composite types.
Fun fact (fun pun intended): He argues: "In spite of these refinements, I
would omit functions if I had to design another language", using the same motivation as functional purists, but with the radically opposite
conclusion, keeping the assignment statement instead. (Procedure
parameters are a vital part of the system, but by not having procedures
as first class values that can be assigned or returned, the problem of
upward funargs is avoided.)
Using simple symbolic intermediate languages (ordinal values of
enumerated types IIRC) between all the phases of compilation, a program compiles into "Edison code", which is then interpreted by a threaded interpreter in the kernel, a small piece of PDP-11 code written in Alva,
an alternative assembler notation for PDP-11 instructions.
Edison is definitely minimalistic and probably not very efficient, and
the "operating system" is very primitive, including just a very simple
editor and some tools and utilities for formatting, prettyprinting, and "housekeeping" like disk formatting, backup, file copy, disk catalog etc.
Yet it _is_ a complete system, what Brinch-Hansen refers to as a
"personal software system", which can be studied and fully understood in
all details by one person in a reasonable amount of time, and also ported
to another machine type very quickly.
We talked about declaration endianness in the PLZ thread, curiously, Brinch-Hansen makes an argument about type declarations that Pascal needs
to read two symbols (the type name being declared, and an equals symbol)
before knowing what kind of type is being declared, and opts for a left-
ended notation, not unlike how variables are declared in non-Pascal Algol descendants:
enum fruit (apple, banana, canteloupe);
set fruitsalad (fruit)
array line [1:80] (char)
Why he insists on the completely redundant parentheses around the element
type of an array I donẗ know. It seems a "glitch" somehow, or perhaps is
is to emphasise the notion of containment and make it similar to the
other forms. The reason he keeps Pascal variable declarations using var
is probably because it makes LL(1) recursive descent parsing simpler. The Edison compiler, which uses a separate pass for lexing, obviously cannot
have any interaction between the lexer and the symbol table. He _does_
cheat once though, in the syntax checking phase 2, where he uses the
symbol table to discern between a name (identifier) starting a procedure
call statement or a variable assignment statement... nobody's perfect. :-)
All in all an astounding accomplishment, with well argued design choices,
and a very good read. His style is very dry, but the clarity and
conciseness makes it easy reading.