To explain, since the 1960s it has been traditional to think of some identifiers are resolving to lvalues and others to rvalues. However, I suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in certain contexts. But programmers find it natural if in other contexts X is implicitly and automatically dereferenced to yield a value. Classically,
in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
I'll set out below what to my knowledge is a novel way of looking at
certain aspects of expression parsing. Don't be alarmed, it doesn't
parse Martian. In fact, I think (subject to correction) that it
implements the normal kind of parsing that a programmer would be
familiar with. But AISI it handles some of it in a simpler, more
natural, and more understandable way than I've seen anywhere else.
To explain, since the 1960s it has been traditional to think of some identifiers are resolving to lvalues and others to rvalues. However,
I suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in
certain contexts.
But programmers find it natural if in other
contexts X is implicitly and automatically dereferenced to yield a
value. Classically, in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
What matters is semantics but contexts are easiest to discuss in
terms of the syntax so I'll do that. In simple terms one could say
that if an expression (of any sort) is followed by one of
= (assignment)
. (field selection)
( (function invocation)
[ (array lookup)
or is tweaked with increment or decrement operators (as in C's ++ and
--) then the /address/ is used.
In all other contexts, however, an implicit [dereferences]
In all other contexts, however, an implicit [dereference] is
automatically inserted by a compiler such that the value at the
designated address is used instead. To illustrate, consider
A[2][4]
Note that after both A *and* the first closing square bracket there
is no dereference. In syntax terms one can consider that that's
because each is followed by one of the aforementioned symbols. IOW
both A and the first closing square bracket are followed by an
opening square bracket so there is no deference. But there /is/ an
automatic dereference after the final square bracket because it is
not followed by one of the listed symbols. So the key as to whether
an automatic dereference is inserted or not is what comes next after
an expression.
That's very flexible, allowing expressions to work with an arbitrary
number of addresses. For example,
B = A[2][4][6][8][10]
etc. That expression uses addresses all the way through. Each array
lookup results in yet another address. Only after the final square
bracket would there be a dereference.
Of course, it's not just array indexing. Anything which /produces/ an
address can have its output fed into anything which /uses/ an address
and such operators can be combined arbitrarily. For example,
vectors[1](2).data[3] = y
Such an expression may be horrendous but illustrates how a programmer
could combine addresses in any way desired. Only after the y would
there be a dereference.
(Perhaps it's strange that as programmers we accept the inconsistency
that some contexts get implicit dereferences and some don't. But we
would probably not want to write all deref or no-deref points in
code. So we are where we are.)
Importantly, it is always possible to dereference an address to get a
value but there is no way to operate on a value to get its address.
For that reason my precedence table has all the address-consuming
operators first. That's probably true of most other languages as well
but I've not seen that set out as a rationale.
Consider how C uses its 'address of' operator, & as a prefix.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator.
It does not transform its argument.
As stated, it is not possible to get from a value to an
address. So &E cannot evaluate E and then take its address. Therefore
& is not an operator in the normal sense that it manipulates a value. Instead, &E inhibits the automatic dereference that would have been
inserted at the end of E: it prevents emission of the dereference
that the compiler would otherwise have emitted.
There is, perhaps, an additional oddity that an 'operator' at the
beginning of a subexpression really applies at the end of that
subexpression.
It may be more straightforward for & to be placed at the location
where the dereference would otherwise have been.
Assuming for discussion purposes that trailing & and infix & can be distinguished (so we don't need to use another symbol) the above
expressions would become
X& the address of X
X[4]& the address of X[4]
X.f& the address of field f
Then the unary trailing & joins the symbols in the list above and
becomes just another of the operators which, when it appears after an expression, inhibits the automatic dereference that would otherwise
have occurred at that point:
= assign
. field selection
( function call
[ index
& nothing except, like all the others, inhibit dereference
To summarise, there would no longer be the conceptual difference
between lvalues and rvalues.
All identifiers would be considered as
producing their addresses, never their values.
There would instead be
contexts in which automatic dereference takes place, and the
programmer would put & in any of those places where the automatic
dereference was to be inhibited.
AFAIK that's a new way of looking at addresses in expressions but
maybe you know otherwise.
More importantly, as a programmer how easy would you find it to think
in those terms?
I wanted to go on to say more but this post is already overlong. I'll
come back to some of the other points.
Naturally, comments welcome!
On 07/11/2021 23:57, James Harris wrote:
To explain, since the 1960s it has been traditional to think of some
identifiers are resolving to lvalues and others to rvalues. However, I
suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in
certain contexts. But programmers find it natural if in other contexts
X is implicitly and automatically dereferenced to yield a value.
Classically, in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
(Haven't we been here before?)
In X = X, both sides are dereferenced, one for reading, one for writing:
mov D0, [x]
mov [x], D0
If you like, emulate a language that doesn't dereference automatically
by writing &X instead of X. Then to do that assignment, you'd need to
write:
*(&X) = *(&X)
Why would you need * on both sides if only one side is dereferenced?
Consider how C uses its 'address of' operator, & as a prefix.I'm not commenting on your main points at the moment - I think it is an interesting view, and worth thinking about.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator. It does not transform its argument
On Sun, 7 Nov 2021 23:57:38 +0000
James Harris <james.harris.1@gmail.com> wrote:
Classically, in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
What? ... Of course, it is. X is dereferenced twice here.
You must get the address of X in both instances.
You need the address on the right to read/access X's value.
You need the address on the left to write/store X's value.
What matters is semantics but contexts are easiest to discuss in
terms of the syntax so I'll do that. In simple terms one could say
that if an expression (of any sort) is followed by one of
= (assignment)
. (field selection)
( (function invocation)
[ (array lookup)
or is tweaked with increment or decrement operators (as in C's ++ and
--) then the /address/ is used.
Personally, I think you're looking at this all the wrong way around.
Treat everything as an address from the get-go.
Let's make it proper with an assignment:
B = A[2][4];
Start with A is an address.
Also, B is an address.
Adjust A's address by [2][4], which depends on the type's size.
Read data from the adjusted address of whatever type A was declared as.
Store data read from A at address B.
In my explanation above, the assignment operator = does the dereference
of adjusted address for A to read and the dereference of B's address to store.
= assign
. field selection
( function call
[ index
& nothing except, like all the others, inhibit dereference
To summarise, there would no longer be the conceptual difference
between lvalues and rvalues.
...
All identifiers would be considered as
producing their addresses, never their values.
As stated previously here and numerous other posts, that is, IMO, the
correct approach.
On Sun, 7 Nov 2021 23:57:38 +0000
James Harris <james.harris.1@gmail.com> wrote:
Of course, it's not just array indexing. Anything which /produces/ an
address can have its output fed into anything which /uses/ an address
and such operators can be combined arbitrarily. For example,
vectors[1](2).data[3] = y
Please, trust me and treat everything as an address. It will make your
like much easier.
On 08/11/2021 00:57, James Harris wrote:
Consider how C uses its 'address of' operator, & as a prefix.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator. It does not transform its argument
I'm not commenting on your main points at the moment - I think it is an interesting view, and worth thinking about.
However, your comment that "C's & is not a normal operator" is somewhat bizarre - it implies there is such a thing as a "normal operator". C
has all sorts of operators - function calls are operators, sizeof and _Alignof are operators (neither of which evaluates their operand, and
the operand can be a type rather than an expression), assignment is an operator (while in many languages, it is a statement). Casts are
operators that may or may not affect the value of the operand. The
comma operator evaluates and then discards its first operand. Structure
and union member access are operators.
I suppose you mean to say that "&" is somewhat different from addition
or multiplication. Alternatively, you could say that most operators in
C are not normal operators!
On 08/11/2021 00:57, James Harris wrote:
Consider how C uses its 'address of' operator, & as a prefix.I'm not commenting on your main points at the moment - I think it is an interesting view, and worth thinking about.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator. It does not transform its argument
However, your comment that "C's & is not a normal operator" is somewhat bizarre - it implies there is such a thing as a "normal operator".
On 08/11/2021 00:25, Bart wrote:
In X = X, both sides are dereferenced, one for reading, one for writing:
mov D0, [x]
mov [x], D0
Rather than dereferenced do you mean that both sides are /accessed/?
I should explain what I mean by dereferencing. I mean, effectively,
following a pointer.
IOW EAX contains an address and that instruction replaces it with the
value at that address, aka it follows a pointer, aka it 'dereferences'!
lea eax, [X] ;the & operator
But the * operator on the LHS would be suppressed.
I'll set out below what to my knowledge is a novel way of looking at certain aspects of expression parsing. Don't be alarmed, it doesn't parse Martian. In fact, I think (subject to correction) that it implements the normal kind of parsing that a programmer would be familiar with. But AISI it handles some of it
in a simpler, more natural, and more understandable way than I've seen anywhere
else.
To explain, since the 1960s it has been traditional to think of some identifiers
are resolving to lvalues and others to rvalues. However, I suggest below that another way of looking at matters is that when parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named identifier X.
An address is, of course, how it is interpreted in certain contexts. But programmers find it natural if in other contexts X is implicitly and automatically dereferenced to yield a value. Classically, in the assignment
X = X
even though they look the same the last X is dereferenced while the first is not.
On 07/11/2021 23:57, James Harris wrote:
To explain, since the 1960s it has been traditional to think of
some identifiers are resolving to lvalues and others to rvalues.
[...] Classically, in the assignmentI think you have just re-invented Algol68.
X = X
even though they look the same the last X is dereferenced while the
first is not.
I'll set out below what to my knowledge is a novel way of looking at
certain aspects of expression parsing. Don't be alarmed, it doesn't
parse Martian. In fact, I think (subject to correction) that it
implements the normal kind of parsing that a programmer would be
familiar with. But AISI it handles some of it in a simpler, more
natural, and more understandable way than I've seen anywhere else.
To explain, since the 1960s it has been traditional to think of some identifiers are resolving to lvalues and others to rvalues. However, I suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in certain contexts. But programmers find it natural if in other contexts X is implicitly and automatically dereferenced to yield a value. Classically,
in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
What matters is semantics but contexts are easiest to discuss in terms
of the syntax so I'll do that. In simple terms one could say that if an expression (of any sort) is followed by one of
= (assignment)
. (field selection)
( (function invocation)
[ (array lookup)
or is tweaked with increment or decrement operators (as in C's ++ and
--) then the /address/ is used. In all other contexts, however, an
implicit deference is automatically inserted by a compiler such that the value at the designated address is used instead. To illustrate, consider
On 08/11/2021 08:52, David Brown wrote:
On 08/11/2021 00:57, James Harris wrote:
Consider how C uses its 'address of' operator, & as a prefix.I'm not commenting on your main points at the moment - I think it is an
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator. It does not transform its argument
interesting view, and worth thinking about.
However, your comment that "C's & is not a normal operator" is somewhat
bizarre - it implies there is such a thing as a "normal operator".
Well, you can't implement it with a function! Such as:
int a;
int* p = addressof(a);
Although you can do it if you twist the language around, but in general,
if 'a' normally means its value, you can't turn a value into the address where it's stored.
On 08/11/2021 07:53, James Harris wrote:
On 08/11/2021 00:25, Bart wrote:
In X = X, both sides are dereferenced, one for reading, one for writing: >>>
mov D0, [x]
mov [x], D0
Rather than dereferenced do you mean that both sides are /accessed/?
I should explain what I mean by dereferencing. I mean, effectively,
following a pointer.
Following a pointer and then doing what? If you have a chains of derefs
like this:
***p = 0;
The first two will be read, the last used for writing.
IOW EAX contains an address and that instruction replaces it with the
value at that address, aka it follows a pointer, aka it 'dereferences'!
OK, now I understand. If you have a machine with one register which
contains a pointer, and read the address at the pointer:
mov R, [R]
then R is replaced with the target.
But that doesn't happen here:
mov [R], 0 # R is unchanged
It needn't happen here either:
mov R2, [R] # R is unchanged
I see 'dereferencing' as something to do with type system.
If P is a pointer, it might have type T*. If you dereference it, the
value you get has type T. The '*' reference has disappeared! But that
happens whether reading or writing:
*Q = *P
Both P and Q have type T*. During and after the assigning, they will
still have type T*.
To implement the assignment, * is used to dereference P's value of type
T* to get a value X of type T, and * is used to dereference Q's value of
type T*, to store X of type T.
lea eax, [X] ;the & operator
But the * operator on the LHS would be suppressed.
Only because you have haven't shown it. But to write to the address in
eax to complete the assignment, you have to use [eax].
You seem to want to distinguish between an address used for reading
([eax]), and an address used for writing ([eax]).
On 08/11/2021 10:36, Bart wrote:
On 08/11/2021 07:53, James Harris wrote:
On 08/11/2021 00:25, Bart wrote:
In X = X, both sides are dereferenced, one for reading, one for
writing:
mov D0, [x]
mov [x], D0
Rather than dereferenced do you mean that both sides are /accessed/?
I should explain what I mean by dereferencing. I mean, effectively,
following a pointer.
Following a pointer and then doing what? If you have a chains of
derefs like this:
***p = 0;
The first two will be read, the last used for writing.
IOW EAX contains an address and that instruction replaces it with the
value at that address, aka it follows a pointer, aka it 'dereferences'!
OK, now I understand. If you have a machine with one register which
contains a pointer, and read the address at the pointer:
mov R, [R]
then R is replaced with the target.
Yes, that's approximately the model. Your R could, in practice, be the
value at the top of the evaluation stack - even if the top word of the evaluation stack is kept in a register, if you see what I mean. But,
yes, what I am calling a dereference would replace TOS with what TOS
points at.
But that doesn't happen here:
mov [R], 0 # R is unchanged
It needn't happen here either:
mov R2, [R] # R is unchanged
For both of those consider a fully generic model of assignment which
strips out any recognition of particular cases:
(expression 0) = (expression 1)
In that, expression 0 can be absolutely anything legal which results in
an address. Similarly, expression 1, type checking permitting, can be absolutely anything legal which results in the value to be stored at the aforementioned address. In register terms you could have the evaluated
result of expression 0 in R0 and the evaluated result of expression 1 in
R1. Then the assignment would be
mov [R0], R1
I see 'dereferencing' as something to do with type system.
Perhaps that's because an explicit dereference does, indeed, always
convert one type to another, as you point out below. But the important
point, here, is that a dereference replaces TOS with what TOS points at.
On 08/11/2021 18:30, James Harris wrote:
On 08/11/2021 10:36, Bart wrote:
OK, now I understand. If you have a machine with one register which
contains a pointer, and read the address at the pointer:
mov R, [R]
then R is replaced with the target.
Yes, that's approximately the model. Your R could, in practice, be the
value at the top of the evaluation stack - even if the top word of the
evaluation stack is kept in a register, if you see what I mean. But,
yes, what I am calling a dereference would replace TOS with what TOS
points at.
So for you, dereferencing can only ever produce an rvalue.
Using an analogy of numbered lockers, if you had a card in your hand
with locker number 37 on it, dereferencing is the process of opening
door 37, and extracting some artefact.
But if you had the card in one hand, and already had an artefact in the other, what would you call the process of opening door 37, and
/inserting/ that object?
To me, acting on that '37' by opening the door to the locker is 'dereferencing' whether you put something in or take something out.
Going back to code, take this example:
*Q += *P
Now, *Q has to be dereferenced to extract a value, modify it with *P,
and put it back.
I see 'dereferencing' as something to do with type system.
Perhaps that's because an explicit dereference does, indeed, always
convert one type to another, as you point out below. But the important
point, here, is that a dereference replaces TOS with what TOS points at.
It depends on the 'instruction' set of the virtual machine.
I normally do A := B with:
push B
pop A
I could also do it like this:
push &B
pushptr # replace TOS with *TOS - your 'deref'
pop B
or doing it both sides:
push &B
pushptr
push &A
popptr
On 08/11/2021 19:54, Bart wrote:
I would call that /accessing/.
To me, acting on that '37' by opening the door to the locker is
'dereferencing' whether you put something in or take something out.
Fine but that's not what I was thinking of when I used the term. See
https://en.wikipedia.org/wiki/Dereference_operator
where it speaks about *returning the value at the pointer address*.
I could also do it like this:
push &B
pushptr # replace TOS with *TOS - your 'deref'
pop B
What would that look like if the assignment were
A := B + C
That looks close to what I have been talking about but what if the terms
were expressions such as
A[2] := B[3] + C[4]
On 08/11/2021 20:31, James Harris wrote:
On 08/11/2021 19:54, Bart wrote:
I would call that /accessing/.
OK. We'll have to disagree on that point.
Except,
what would you call
what happens on the LHS here:
*Q += *P
I could also do it like this:
push &B
pushptr # replace TOS with *TOS - your 'deref'
pop B
What would that look like if the assignment were
A := B + C
If the pushes were done via PUSHPTR, then:
Stack (grows LTR)
push &B &B
pushptr B
push &C B &C
pushptr B C
add B+C
pop A -
That looks close to what I have been talking about but what if the
terms were expressions such as
A[2] := B[3] + C[4]
That gets complicated to do by hand. The actual IR I generate for that is:
push &b
push 3 i64
pushptroff i64 8 -8
push &c
push 4 i64
pushptroff i64 8 -8
add i64
push &a
push 2 i64
popptroff i64 8 -8
PUSHPTROFF is like PUSHPTR but takes an offset, which can be scaled and
a further constant offset added (the 8 and -8 shown). It's equivalent to
this C:
*((char*)A+2*8-8) = *((char*)B+3*8-8) + *((char*)C+4*8-8)
The -8 is due to my arrays being 1-based. This reduces down to this x64
code:
mov D0, [b+16]
mov D1, [c+24]
add D0, D1
mov [a+8], D0
On 08/11/2021 11:38, Charles Lindsey wrote:
On 07/11/2021 23:57, James Harris wrote:
To explain, since the 1960s it has been traditional to think of
some identifiers are resolving to lvalues and others to rvalues.
I think you mean the '70s, or perhaps even the '80s? It
didn't become in any way "traditional" until well after C became
popular. Also, I suspect you meant "expression" rather than
"identifier"?
On 07/11/2021 23:57, James Harris wrote:
I'll set out below what to my knowledge is a novel way of looking at
certain aspects of expression parsing. Don't be alarmed, it doesn't
parse Martian. In fact, I think (subject to correction) that it
implements the normal kind of parsing that a programmer would be
familiar with. But AISI it handles some of it in a simpler, more
natural, and more understandable way than I've seen anywhere else.
To explain, since the 1960s it has been traditional to think of some
identifiers are resolving to lvalues and others to rvalues. However, I
suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in
certain contexts. But programmers find it natural if in other contexts
X is implicitly and automatically dereferenced to yield a value.
Classically, in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
I think you have just re-invented Algol68.
On 08/11/2021 11:38, Charles Lindsey wrote:
I think you have just re-invented Algol68.
It's funny you should say that. I do think there's a similarity to Algol68 in returns from functions which I hadn't even mentioned but will do so now.
Consider a subexpression such as
A + B
I suggested before that both A and B should initially be taken semantically as
being addresses and that in the context in which they appear both would be dereferenced because, in simple terms, they are not followed by one of the symbols which inhibit dereferences:
. member selection
( function call
[ array indexing
= assignment
* do nothing (other than inhibit dereference)
Now consider
A[1] + B(0)
To be consistent, the subexpression A[1] would also yield the /address/ of the
element rather than its value. Then assignment to an element would happen naturally:
A[1] = A[2]
Because of the = sign the A[1] would not be dereferenced. By contrast, because
there's no inhibiting symbol the A[2] /would/ be dereferenced - exactly as for
simple variables and as expected in most familiar programming languages.
Now, here's the point I wanted to add: To be even more consistent the same would
be true of B(0). It, too, would result in the address of the return value rather
than the value itself. AIUI that is what Algol68 also does but it could lead to
some strange expressions such as
B(0) = 5
To 'tame' that I am thinking that a function would have to explicitly mark any
return as being writeable if it could be assigned to by the caller. Any return
which was not thus marked would only be treatable as a value. I think that offers the best of both worlds: consistent treatment but with the default option
being safe.
I'll set out below what to my knowledge is a novel way of looking at
certain aspects of expression parsing. Don't be alarmed, it doesn't
parse Martian. In fact, I think (subject to correction) that it
implements the normal kind of parsing that a programmer would be
familiar with. But AISI it handles some of it in a simpler, more
natural, and more understandable way than I've seen anywhere else.
To explain, since the 1960s it has been traditional to think of some identifiers are resolving to lvalues and others to rvalues.
However, I
suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in certain contexts. But programmers find it natural if in other contexts X is implicitly and automatically dereferenced to yield a value. Classically,
in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
What matters is semantics but contexts are easiest to discuss in terms
of the syntax so I'll do that. In simple terms one could say that if an expression (of any sort) is followed by one of
= (assignment)
. (field selection)
( (function invocation)
[ (array lookup)
or is tweaked with increment or decrement operators (as in C's ++ and
--) then the /address/ is used. In all other contexts, however, an
implicit deference is automatically inserted by a compiler such that the value at the designated address is used instead. To illustrate, consider
A[2][4]
Note that after both A *and* the first closing square bracket there is
no dereference.
In syntax terms one can consider that that's because
each is followed by one of the aforementioned symbols. IOW both A and
the first closing square bracket are followed by an opening square
bracket so there is no deference. But there /is/ an automatic
dereference after the final square bracket because it is not followed by
one of the listed symbols. So the key as to whether an automatic
dereference is inserted or not is what comes next after an expression.
That's very flexible, allowing expressions to work with an arbitrary
number of addresses. For example,
B = A[2][4][6][8][10]
etc. That expression uses addresses all the way through. Each array
lookup results in yet another address. Only after the final square
bracket would there be a dereference.
Of course, it's not just array indexing. Anything which /produces/ an
address can have its output fed into anything which /uses/ an address
and such operators can be combined arbitrarily. For example,
vectors[1](2).data[3] = y
Such an expression may be horrendous but illustrates how a programmer
could combine addresses in any way desired. Only after the y would there
be a dereference.
(Perhaps it's strange that as programmers we accept the inconsistency
that some contexts get implicit dereferences and some don't. But we
would probably not want to write all deref or no-deref points in code.
So we are where we are.)
Importantly, it is always possible to dereference an address to get a
value but there is no way to operate on a value to get its address.
For
that reason my precedence table has all the address-consuming operators first. That's probably true of most other languages as well but I've not
seen that set out as a rationale.
Consider how C uses its 'address of' operator, & as a prefix.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
Yet C's & is not a normal operator. It does not transform its argument.
As stated, it is not possible to get from a value to an address. So &E
cannot evaluate E and then take its address. Therefore & is not an
operator in the normal sense that it manipulates a value. Instead, &E inhibits the automatic dereference that would have been inserted at the
end of E: it prevents emission of the dereference that the compiler
would otherwise have emitted.
There is, perhaps, an additional oddity that an 'operator' at the
beginning of a subexpression really applies at the end of that
subexpression.
It may be more straightforward for & to be placed at the location where
the dereference would otherwise have been.
Assuming for discussion purposes that trailing & and infix & can be distinguished (so we don't need to use another symbol) the above
expressions would become
X& the address of X
X[4]& the address of X[4]
X.f& the address of field f
Then the unary trailing & joins the symbols in the list above and
becomes just another of the operators which, when it appears after an expression, inhibits the automatic dereference that would otherwise have occurred at that point:
= assign
. field selection
( function call
[ index
& nothing except, like all the others, inhibit dereference
To summarise, there would no longer be the conceptual difference between lvalues and rvalues. All identifiers would be considered as producing
their addresses, never their values. There would instead be contexts in
which automatic dereference takes place, and the programmer would put &
in any of those places where the automatic dereference was to be inhibited.
AFAIK that's a new way of looking at addresses in expressions but maybe
you know otherwise.
More importantly, as a programmer how easy would you find it to think in those terms?
On Sunday, November 7, 2021 at 3:57:40 PM UTC-8, James Harris wrote:
[Joining late, haven't read all of the conversation.]
However, I
suggest below that another way of looking at matters is that when
parsing an expression the presence of an identifier name such as
X
/always/ results not in the value but in the address of the named
identifier X. An address is, of course, how it is interpreted in certain
contexts. But programmers find it natural if in other contexts X is
implicitly and automatically dereferenced to yield a value. Classically,
in the assignment
X = X
even though they look the same the last X is dereferenced while the
first is not.
Um... It may be somewhat confusing because dereferences for the
purpose of reading from memory and dereferences for the purpose of
writing to memory appear somewhat different.
C's assign operators require their left operand to be an lvalue.
What is an lvalue (in, perhaps, a somewhat mechanistic view)?
It's an expression formed by dereferencing an address. And you
naturally need a memory address to both read and write memory.
But your = operator by itself screams in your face "I'm a memory
writing dereference!". Effectively, you may think that the lvalue's
own dereference and the one implied by the = operator are
duplicating one another or are two parts of one thing. Either way,
when you're writing a C compiler, once you've checked the types
in the assignment expression, you end up either eliminating the
lvalue's own dereference or you somehow fuse it with =
because in the end you generate just a single memory store
instruction that represents both the dereference and =.
Given this you may indeed think that the left operand of =
needs to be no more than an address and it's somehow
different from the right operand of =. But you need to consider
both, the dereference and =, together.
In all other contexts, however, an
implicit deference is automatically inserted by a compiler such that the
value at the designated address is used instead. To illustrate, consider
A[2][4]
Note that after both A *and* the first closing square bracket there is
no dereference.
Strictly speaking, there is and its result is, as usual, an element of the array, which happens to be another array, which luckily needs no memory read/write (yet) and only pointer arithmetic is needed here.
But with a further dereference you will have to access memory because
that array element is not an array anymore.
Of course, it's not just array indexing. Anything which /produces/ an
address can have its output fed into anything which /uses/ an address
and such operators can be combined arbitrarily. For example,
vectors[1](2).data[3] = y
Such an expression may be horrendous but illustrates how a programmer
could combine addresses in any way desired. Only after the y would there
be a dereference.
It's probably important to note that the above expression in C produces
a temporary value (the function return value) that needs to hang around
for a while in order for .data to be accessed off it. Mechanically
it needs to be an lvalue, but it's short lived and messing with it is therefore troublesome, hence the standard says modifying the return
value yields undefined behavior. That is, if that data member is a
pointer, the expression may be well formed. If data is an array, you
have UB right there where you attempt to modify its 3rd element.
(Perhaps it's strange that as programmers we accept the inconsistency
that some contexts get implicit dereferences and some don't. But we
would probably not want to write all deref or no-deref points in code.
So we are where we are.)
Definitely, you don't have to expose the underlying mechanics when
it creates unnecessary friction (e.g. in form of verbosity and mental effort).
Consider how C uses its 'address of' operator, & as a prefix.
&X gets the address of X
&X[4] gets the address of X[4]
&X.f gets the address of field f
There is, perhaps, an additional oddity that an 'operator' at the
beginning of a subexpression really applies at the end of that
subexpression.
Well, if it helps to read the code, you could use parens, e.g. &(X[4]),
but they are meaningless here. You could also prohibit large and complex expressions and require them to be broken down into shorter and
simpler ones with e.g. temporary variables at every step, but
that (temporaries and low code density) in itself is problematic.
If you read X[4] into a temporary, taking its (temporary's) address wouldn't give you the address within X[], which is kinda bad.
I think postfix expressions (and I mean not just postfix ++ and -- but
all of this subscripting, calling, member accessing) in C are more useful than not.
It may be more straightforward for & to be placed at the location where
the dereference would otherwise have been.
Assuming for discussion purposes that trailing & and infix & can be
distinguished (so we don't need to use another symbol) the above
expressions would become
X& the address of X
X[4]& the address of X[4]
X.f& the address of field f
Should we also use numeric negation this way, e.g. X[4]- in place of
-X[4]? That would look pretty awkward to mathy people (not that
they'd find it an insurmountable obstacle, I hope).
To summarise, there would no longer be the conceptual difference between
lvalues and rvalues. All identifiers would be considered as producing
their addresses, never their values. There would instead be contexts in
which automatic dereference takes place, and the programmer would put &
in any of those places where the automatic dereference was to be inhibited.
Then you also need to distinguish pointer arithmetic from non-pointer arithmetic
if you still want to keep both.
If a-b now gives me the distance between a and b in memory instead of the numeric difference of the values stored at addresses a and b, it's kinda bad.
Similarly, I don't always mean a pointer when I write a+1.
No?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 113 |
Nodes: | 8 (1 / 7) |
Uptime: | 03:44:13 |
Calls: | 2,497 |
Calls today: | 14 |
Files: | 8,644 |
Messages: | 1,901,802 |