There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,
(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array
Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.
The question is, Would that kind of semantics be helpful or would it
cause more problems than it solves?
This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.
There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,
(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array
Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.
For example, if s is a string then
f(s)
would pass the string's address and length - probably as two separate arguments even though only one appears in the argument list, something
which I believe could be called an 'unboxed tuple'. Further, if # means
to allow modification then
f(#s)
would pass the string's address and length in such a way that either or
both could be modified. Similarly,
return s
would return address and length (which the caller would accept as a unit).
Both caller and callee would see s as a single object unless they chose
to break down the tuple with such as
s._ptr
s._len
This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.
There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,
(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array
The question is, Would that kind of semantics be helpful or would it
cause more problems than it solves?
On 2021-09-04 17:56, James Harris wrote:
This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.
There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,
(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array
Why address, why single constraint? By-value passing is as useful as by-reference. The general form is:
(constraint-1, constraint-2, ..., constraint-N, reference)
(constraint-1, constraint-2, ..., constraint-N, value)
On 04/09/2021 18:40, Dmitry A. Kazakov wrote:
On 2021-09-04 17:56, James Harris wrote:
This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.
There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,
(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array
Why address, why single constraint? By-value passing is as useful as
by-reference. The general form is:
(constraint-1, constraint-2, ..., constraint-N, reference)
(constraint-1, constraint-2, ..., constraint-N, value)
I don't see the elements I was talking about as constraints. To explain,
one could consider the following expression
p := q + 1
as getting the addresses of both p and q and dereferencing q to obtain
its value. It's normal for compiled code to work with an object's
location. I was positing adding one or more other pieces of info.
With a value model an identifier such as
x
indicates
1. a type (classically, known at compile time)
2. whether the location or the value is needed (known from context))
3. the location of the object (known at run time)
but for slices there also needs to be
4. the length of the object (known by run time)
It's normal for the runtime to know (3) the location. What I was asking
was about the runtime also maintaining (4) the length. IOW instead of
the runtime knowing an object's location it would know both of
(location, length).
One way to treat the strings as first-class objects (possibly with
immutable lengths) could be by passing around the 2-tuple (location,
length).
Slices
Objects
=======
Consider the object tags we have been talking about in another thread.
They could be stored not with the object but maintained separately by
the compiler. An object would then be known to a compiler as the tuple (location, tag).
That would allow us to treat objects in store as ADTs without having to change the storage layout such as prepending those objects with a tag -
which would make some things when working at a lower level easier and
faster.
For example, it may be better to store the fields of the tuple as a
control block and to pass pointers to those control blocks instead of
the fields from the control block.
Or it might be better to store (location, length) or (location, tag) etc
as just of many fields in sentient pointers which include other
information.
Or, (one more!) it may be better for a programmer to pass the fields of
each tuple explicitly.
Bottom line for me at present is that I think this would work very well
for zero-based arrays where the length could not be changed by a callee,
and there are potential performance and compatibility benefits of not
having to alter an object's layout (by adding a tag) but the jury is out
for other uses.
On 04/09/2021 16:56, James Harris wrote:
For example, if s is a string then
f(s)
would pass the string's address and length
Both caller and callee would see s as a single object unless they
chose to break down the tuple with such as
s._ptr
s._len
Those string examples pretty much match Slices in my language.
It's a
feature I haven't really used, so it's partly fallen into disuse, but
the following example works (s as a 'char* instead of char[] doesn't work):
proc testfn(slice[]char s)=
println s, s.len, ref void(s.sliceptr)
end
proc start=
static []char s=z"one two three"
testfn("hello")
testfn(s)
testfn(s[5..7])
end
The first two calls to testfn automatically turn the array (which has a
known size, probably why char* didn't work) into a slice object.
On 04/09/2021 18:04, Bart wrote:
On 04/09/2021 16:56, James Harris wrote:
...
For example, if s is a string then
f(s)
would pass the string's address and length
...
Both caller and callee would see s as a single object unless they
chose to break down the tuple with such as
s._ptr
s._len
Those string examples pretty much match Slices in my language.
That makes sense. It was largely because of slices that the issue arose.
I was looking into how to support potential copy-in, copy-out for arrays
(or parts thereof) so a pointer to the first element (as in C) would not
be enough. Length would also be required.
And that's handy because as you showed in your example code the pair (location, length) also works well as a slice, and a slice can be easy
for a programmer to append - e.g. with the [...] in
s[5..7]
So a slice would be a subarray which could also be considered to be an
array in its own right.
I should say that passing arrays as (location, length) and having the potential for copy-in, copy-out does not mean that arrays would always literally be copied. Such arrays or slices could often be passed by reference. However, the potential for copying would be there and the
callee would additionally be informed of the array length.
It's a feature I haven't really used, so it's partly fallen into
disuse, but the following example works (s as a 'char* instead of
char[] doesn't work):
proc testfn(slice[]char s)=
println s, s.len, ref void(s.sliceptr)
end
proc start=
static []char s=z"one two three"
testfn("hello")
testfn(s)
testfn(s[5..7])
end
Do you allow the callee to modify s.len or even s.sliceptr and can a
callee /return/ a slice?
I think you said somewhere that you return a 2-tuple in two registers
but I cannot find that comment ATM.
If slices can be returned should
they be new objects or should they be restricted to referring to parts
of existing objects?
The first two calls to testfn automatically turn the array (which has
a known size, probably why char* didn't work) into a slice object.
I guess char* does not and can not know the length.
Thinking out loud, as it were, are there logically at least two string
types in common use in programming: one dynamic and of variable-length,
the other an array of char where for some operations the array size is
fixed at or by run time?
A callee which expected a read-only argument to be an array of char
could probably accept either type of string and it might be clearer how
a function was going to use a string for it to be declared as array of
char rather than a string of char.
Many options!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 73:22:53 |
Calls: | 6,657 |
Calls today: | 3 |
Files: | 12,203 |
Messages: | 5,332,384 |
Posted today: | 1 |