Forum: >>> Magnum BBS <<<

Handling small tuples

From James Harris@21:1/5 to All on Sat Sep 4 16:56:14 2021

This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

(address, type) for an object
(address, method table ptr) for an object
(address, length) for a string
(address, length) for a subarray or 'slice' of an array

Each of those is a 2-tuple and I think the key requirement is that the
compiler would need to ensure that both parts are kept together.

For example, if s is a string then

f(s)

would pass the string's address and length - probably as two separate
arguments even though only one appears in the argument list, something
which I believe could be called an 'unboxed tuple'. Further, if # means
to allow modification then

f(#s)

would pass the string's address and length in such a way that either or
both could be modified. Similarly,

return s

would return address and length (which the caller would accept as a unit).

Both caller and callee would see s as a single object unless they chose
to break down the tuple with such as

s._ptr
s._len

The question is, Would that kind of semantics be helpful or would it
cause more problems than it solves?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to James Harris on Sat Sep 4 17:05:27 2021

On 04/09/2021 16:56, James Harris wrote:

...

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

(address, type)               for an object
(address, method table ptr)   for an object
(address, length)             for a string
(address, length)             for a subarray or 'slice' of an array

Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.

...

The question is, Would that kind of semantics be helpful or would it
cause more problems than it solves?

I am not sure whether the advantages of passing tuples could be realised
in a better way. Two alternative options spring to mind:

1. Store the tuple (address, length) as a small control block and pass
pointers to it, though that may run into trouble with

return s

if the caller has allocated a new string, s.

2. Have sentient pointers which include address and length and a whole
load of other info to do with types and handling memory; and have a
mechanism which effectively passes such sentient pointers to and from
routines.

AISI sentient pointers would be useful in all sorts of situation
including, I think, objects identified by multiple pieces of
information. But I wonder if tuple handling, while it would be lighter
weight, would add anything significant.

Thoughts? Opinions?

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Sat Sep 4 18:04:20 2021

On 04/09/2021 16:56, James Harris wrote:

This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

(address, type)               for an object
(address, method table ptr)   for an object
(address, length)             for a string
(address, length)             for a subarray or 'slice' of an array

Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.

For example, if s is a string then

f(s)

would pass the string's address and length - probably as two separate arguments even though only one appears in the argument list, something
which I believe could be called an 'unboxed tuple'. Further, if # means
to allow modification then

f(#s)

would pass the string's address and length in such a way that either or
both could be modified. Similarly,

return s

would return address and length (which the caller would accept as a unit).

Both caller and callee would see s as a single object unless they chose
to break down the tuple with such as

s._ptr
s._len

Those string examples pretty much match Slices in my language. It's a
feature I haven't really used, so it's partly fallen into disuse, but
the following example works (s as a 'char* instead of char[] doesn't work):

proc testfn(slice[]char s)=
println s, s.len, ref void(s.sliceptr)
end

proc start=
static []char s=z"one two three"

testfn("hello")
testfn(s)
testfn(s[5..7])
end

The first two calls to testfn automatically turn the array (which has a
known size, probably why char* didn't work) into a slice object.

While the s[5..7] expression in the last call create a slice anyway. The
output is:

hello 5 00408A56
one two three 17 00408150
two 3 00408154

A slice is a 16-byte descriptor consisting of a 64-bit pointer, and a
64-bit length. (Perhap half the size on a 32-bit system.)

s.len extracts that length, and s.sliceptr extracts the pointer, just
like your example.

Slices are 128-bit types which in my implementation are passed by value,
in 2 registers if necessary (the ABI says to pass and return by pointer,
but that's messy, so no thanks).

Slices can also be constructed from a pair of values:

testfn(("ABCDEFHI",4)) # displays ABCD 4

Note that slices here are not specific to strings; they are general
slices of any kind of array. But since the element type here is 'char',
it knows to print this slice as a string (but as a counted string, as
there is no terminator).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Sat Sep 4 19:40:37 2021

On 2021-09-04 17:56, James Harris wrote:

This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

(address, type)               for an object
(address, method table ptr)   for an object
(address, length)             for a string
(address, length)             for a subarray or 'slice' of an array

Why address, why single constraint? By-value passing is as useful as by-reference. The general form is:

(constraint-1, constraint-2, ..., constraint-N, reference)
(constraint-1, constraint-2, ..., constraint-N, value)

Types of constraints you listed exist in Ada:

type = type tag
ptr = access discriminant
length = bounds

But constraints could be any. For example a variant record:

type Variant (Kind_Of : Integer) is record
case Kind_Of is
when 0 =>
null;
when 1 =>
I : Integer;
when others =>
X : Float;
end case;
end record;

Or consider a dimensioned value. The constraint would be its measurement
unit, the value would be the nominal value, and you would like pass it
rather by-value.

The question is, Would that kind of semantics be helpful or would it
cause more problems than it solves?

This is the most important mechanism used in all forms of polymorphism.

Regarding keeping parts together, you must always be ready to strip the constraint(s) or add them.

In general terms, let you pass an indefinite object to f() and then call
g() with a definite constraint. f() expects the argument with its
constraint. g() knows the constraint already.

Example in Ada:

subtype Punched_Card is String (1..80); -- Definite, 80 characters
procedure G (X : Punched_Card); -- Accept only 80 characters

procedure F (X : String) is -- Accept any string = indefinite
begin
G (X); -- Check the actual constraint, strip it, pass the rest
end F;

And conversely:

Y : Punched_Card := (others => ' ');
begin
F (Y); -- Add the constraint 1..80, before passing Y to F

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Sep 5 12:28:26 2021

On 04/09/2021 18:40, Dmitry A. Kazakov wrote:

On 2021-09-04 17:56, James Harris wrote:

This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

   (address, type)               for an object
   (address, method table ptr)   for an object
   (address, length)             for a string
   (address, length)             for a subarray or 'slice' of an array

Why address, why single constraint? By-value passing is as useful as by-reference. The general form is:

   (constraint-1, constraint-2, ..., constraint-N, reference)
   (constraint-1, constraint-2, ..., constraint-N, value)

I don't see the elements I was talking about as constraints. To explain,
one could consider the following expression

p := q + 1

as getting the addresses of both p and q and dereferencing q to obtain
its value. It's normal for compiled code to work with an object's
location. I was positing adding one or more other pieces of info.

With a value model an identifier such as

x

indicates

1. a type (classically, known at compile time)
2. whether the location or the value is needed (known from context))
3. the location of the object (known at run time)

but for slices there also needs to be

4. the length of the object (known by run time)

It's normal for the runtime to know (3) the location. What I was asking
was about the runtime also maintaining (4) the length. IOW instead of
the runtime knowing an object's location it would know both of
(location, length).

...

I take your point about subtypes and length constraints (now snipped)
but I was thinking more of having objects which would be identified by a
single identifier, such as x, but which would have two or more meta
attributes which are known and manipulated at run time.

Consider

function f
(
in String s0
inout String s1
returns String s2
)

where all three strings would be passed as (location, length). Function
f could read the 'in' parameter s0, modify the inout parameter s1 and
return the s2.

One way to treat the strings as first-class objects (possibly with
immutable lengths) could be by passing around the 2-tuple (location,
length).

Slices
======

That arrangement should work well for zero-based array slices. In fact,
with zero basing (i.e. arrays which are considered to be indexed from
zero to len - 1) arrays and slices of arrays would be passed in the same
way (location, length) and a callee would be able to process either as

array of T

The callee would know the type, T, as well as the location and the
length of any parameter it was called with.

Objects
=======

Consider the object tags we have been talking about in another thread.
They could be stored not with the object but maintained separately by
the compiler. An object would then be known to a compiler as the tuple (location, tag).

That would allow us to treat objects in store as ADTs without having to
change the storage layout such as prepending those objects with a tag -
which would make some things when working at a lower level easier and
faster.

Alternatives
============

While there are temptations, is passing tuples around a good idea?

For example, it may be better to store the fields of the tuple as a
control block and to pass pointers to those control blocks instead of
the fields from the control block.

Or it might be better to store (location, length) or (location, tag) etc
as just of many fields in sentient pointers which include other
information.

Or, (one more!) it may be better for a programmer to pass the fields of
each tuple explicitly.

Bottom line for me at present is that I think this would work very well
for zero-based arrays where the length could not be changed by a callee,
and there are potential performance and compatibility benefits of not
having to alter an object's layout (by adding a tag) but the jury is out
for other uses.

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Sep 5 14:10:59 2021

On 2021-09-05 13:28, James Harris wrote:

On 04/09/2021 18:40, Dmitry A. Kazakov wrote:

On 2021-09-04 17:56, James Harris wrote:

This query was prompted by issues raised in the discussion about an
object having a pointer or a tag.

There are some situations in programming where it could be useful to
identify an object not by a single datum such as its address but by a
small tuple which includes the address. For example,

   (address, type)               for an object
   (address, method table ptr)   for an object
   (address, length)             for a string
   (address, length)             for a subarray or 'slice' of an array

Why address, why single constraint? By-value passing is as useful as
by-reference. The general form is:

    (constraint-1, constraint-2, ..., constraint-N, reference)
    (constraint-1, constraint-2, ..., constraint-N, value)

I don't see the elements I was talking about as constraints. To explain,
one could consider the following expression

p := q + 1

as getting the addresses of both p and q and dereferencing q to obtain
its value. It's normal for compiled code to work with an object's
location. I was positing adding one or more other pieces of info.

With a value model an identifier such as

x

indicates

1. a type (classically, known at compile time)
2. whether the location or the value is needed (known from context))
3. the location of the object (known at run time)

but for slices there also needs to be

4. the length of the object (known by run time)

No, that is just an indefinite string. Its value consists of the body
and the actual constraint, the length in your case, bounds in more
advanced languages.

It's normal for the runtime to know (3) the location. What I was asking
was about the runtime also maintaining (4) the length. IOW instead of
the runtime knowing an object's location it would know both of
(location, length).

That is just a value which internal representation is a tuple.

One way to treat the strings as first-class objects (possibly with
immutable lengths) could be by passing around the 2-tuple (location,
length).

The indefinite strings.

Definite strings have only bodies, the constraint (length) is known
statically from the subtype.

Slices

Slice is just a string.

It is about memory management, slice is a referential object that refers
to some other string object and shall not outlive it.

Objects
=======

Consider the object tags we have been talking about in another thread.
They could be stored not with the object but maintained separately by
the compiler. An object would then be known to a compiler as the tuple (location, tag).

This is a class-wide object.

That would allow us to treat objects in store as ADTs without having to change the storage layout such as prepending those objects with a tag -
which would make some things when working at a lower level easier and
faster.

Embedded tags are used with relatively large objects passed by
reference. Outbound tags are used with small scalar objects.

To have the latter you must distinguish specific and class-wide types.
The former allows conflating both types, as C++ does.

For example, it may be better to store the fields of the tuple as a
control block and to pass pointers to those control blocks instead of
the fields from the control block.

This is merely passing by reference rather than by value.

Or it might be better to store (location, length) or (location, tag) etc
as just of many fields in sentient pointers which include other
information.

No difference, just by reference passing. Unless you introduce a new
type for the pointer which would an awful idea.

Or, (one more!) it may be better for a programmer to pass the fields of
each tuple explicitly.

Leave that to Bart! (:-))

Bottom line for me at present is that I think this would work very well
for zero-based arrays where the length could not be changed by a callee,
and there are potential performance and compatibility benefits of not
having to alter an object's layout (by adding a tag) but the jury is out
for other uses.

All depends on two factors:

1. the parameter passing method
2. embedding the constraint

A good language must allow all combinations of these.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Harris@21:1/5 to Bart on Sun Sep 5 12:31:49 2021

On 04/09/2021 18:04, Bart wrote:

On 04/09/2021 16:56, James Harris wrote:

...

For example, if s is a string then

   f(s)

would pass the string's address and length

...

Both caller and callee would see s as a single object unless they
chose to break down the tuple with such as

   s._ptr
   s._len

Those string examples pretty much match Slices in my language.

That makes sense. It was largely because of slices that the issue arose.
I was looking into how to support potential copy-in, copy-out for arrays
(or parts thereof) so a pointer to the first element (as in C) would not
be enough. Length would also be required.

And that's handy because as you showed in your example code the pair
(location, length) also works well as a slice, and a slice can be easy
for a programmer to append - e.g. with the [...] in

s[5..7]

So a slice would be a subarray which could also be considered to be an
array in its own right.

I should say that passing arrays as (location, length) and having the
potential for copy-in, copy-out does not mean that arrays would always literally be copied. Such arrays or slices could often be passed by
reference. However, the potential for copying would be there and the
callee would additionally be informed of the array length.

It's a
feature I haven't really used, so it's partly fallen into disuse, but
the following example works (s as a 'char* instead of char[] doesn't work):

    proc testfn(slice[]char s)=
        println s, s.len, ref void(s.sliceptr)
    end

    proc start=
        static []char s=z"one two three"

        testfn("hello")
        testfn(s)
        testfn(s[5..7])
    end

Do you allow the callee to modify s.len or even s.sliceptr and can a
callee /return/ a slice?

I think you said somewhere that you return a 2-tuple in two registers
but I cannot find that comment ATM. If slices can be returned should
they be new objects or should they be restricted to referring to parts
of existing objects?

The first two calls to testfn automatically turn the array (which has a
known size, probably why char* didn't work) into a slice object.

I guess char* does not and can not know the length.

Thinking out loud, as it were, are there logically at least two string
types in common use in programming: one dynamic and of variable-length,
the other an array of char where for some operations the array size is
fixed at or by run time?

A callee which expected a read-only argument to be an array of char
could probably accept either type of string and it might be clearer how
a function was going to use a string for it to be declared as array of
char rather than a string of char.

Many options!

--
James Harris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bart@21:1/5 to James Harris on Sun Sep 5 14:53:28 2021

On 05/09/2021 12:31, James Harris wrote:

On 04/09/2021 18:04, Bart wrote:

On 04/09/2021 16:56, James Harris wrote:

...

For example, if s is a string then

   f(s)

would pass the string's address and length

...

Both caller and callee would see s as a single object unless they
chose to break down the tuple with such as

   s._ptr
   s._len

Those string examples pretty much match Slices in my language.

That makes sense. It was largely because of slices that the issue arose.
I was looking into how to support potential copy-in, copy-out for arrays
(or parts thereof) so a pointer to the first element (as in C) would not
be enough. Length would also be required.

And that's handy because as you showed in your example code the pair (location, length) also works well as a slice, and a slice can be easy
for a programmer to append - e.g. with the [...] in

s[5..7]

So a slice would be a subarray which could also be considered to be an
array in its own right.

I should say that passing arrays as (location, length) and having the potential for copy-in, copy-out does not mean that arrays would always literally be copied. Such arrays or slices could often be passed by reference. However, the potential for copying would be there and the
callee would additionally be informed of the array length.

I should say that a slice is an explcit user-declated type in my language.

Regular arrays can be passed, nominally by value (it's normally done
with pointers) but they are fixed size so caller and callee both know
the length.

The language doesn't have a built-in way of representing dynamic arrays;
slices were one alternative to passing separate pointers and lengths.

(The dynamic language handles those of course, but by fairly heavyweight descriptors.)

It's a feature I haven't really used, so it's partly fallen into
disuse, but the following example works (s as a 'char* instead of
char[] doesn't work):

     proc testfn(slice[]char s)=
         println s, s.len, ref void(s.sliceptr)
     end

     proc start=
         static []char s=z"one two three"

         testfn("hello")
         testfn(s)
         testfn(s[5..7])
     end

Do you allow the callee to modify s.len or even s.sliceptr and can a
callee /return/ a slice?

I don't think I allow that. But you can extract those components, modify
them and construct a new slice. To change the caller's copy of slice, it
needs to be a reference parameter to an actual slice object (not a
constructed one).

I think you said somewhere that you return a 2-tuple in two registers
but I cannot find that comment ATM.

They use the same handling as 128-bit numbers. A 64-bit value is
returned in register D0 (rax), and 128-bit ones in registers D1:D0
(don't know what D1 is in Intel terms, but it'll be one of the volatile
ones).

This program:

function fn(string s,t)string u= # string defined below
return u
end

proc start=
string a,b,c
c:=fn(a,b)
end

Generates this ASM (note that my register optimiser will only put or
keep 64-bit variables in registers; 128-bit ones are a little too
rarified for that):

t.fn:
fn.s = 16
fn.t = 32
fn.u = -16
push Dframe
mov Dframe, Dstack
sub Dstack, 16
mov [Dframe+16], D10
mov [Dframe+24], D11
mov [Dframe+32], D12
mov [Dframe+40], D13 ;-------------------------------------------------
mov D0, [Dframe+fn.u]
mov D1, [Dframe+fn.u+8] ;-------------------------------------------------
add Dstack, 16
pop Dframe
ret

t.start:
....
mov D10, [Dframe+start.a]
mov D11, [Dframe+start.a+8]
mov D12, [Dframe+start.b]
mov D13, [Dframe+start.b+8]
call t.fn
mov [Dframe+start.c], D0
mov [Dframe+start.c+8], D1
....

If slices can be returned should
they be new objects or should they be restricted to referring to parts
of existing objects?

Since slices are an explicit type of fixed size (16 bytes) they are
passed and returned by value:

type string = slice[]char # save some typing

function widen(string s)string t=
t:=(s.sliceptr, s.len+1)
return t
end

proc start=
string a:=("ABCDEFGH",3)

println a
println widen(a)
end

This displays ABC then ABCD. It's up to your code to ensure the string
they point to still exists.

I could have used:

string a:="ABC"

too (it will treat that as a:=("ABC", 3)) but my widening example
wouldn't have worked too well! Here's another way of returning a slice:

function left3(string s)string t=
return s[1..3]
end

This could be called as left3("ABCDEFGHI"), and returns "ABC" as a
slice. A bit too trivial though, as you can just do "ABCDEFGHI"[1..3].

A returned slice value could refer to an existing string or slice or
slice of slice etc, but it can refer to a newly allocated string.

At level I implement this, user-code needs to keep track of that.

(My 2-element slices can be used for allocated arrays that do not change
in size. Arrays that grow are more challenging. It could be done by
splitting the 64-bit length into a 32-bit length, and 32-capacity, but
then it also needs language support to check accesses are within bounds,
and to grow the arrays as needed.

A bit like the C++ vector type, but that uses a 196-bit descriptor,
which really needs handling by reference. But I decided that was a
little beyond the remit of my static language.)

The first two calls to testfn automatically turn the array (which has
a known size, probably why char* didn't work) into a slice object.

I guess char* does not and can not know the length.

It could assume a terminated string (when there are separate char and u8 types), and call strlen to determine the length for the slice.

Thinking out loud, as it were, are there logically at least two string
types in common use in programming: one dynamic and of variable-length,
the other an array of char where for some operations the array size is
fixed at or by run time?

Yes, there's three (at least):

* Fixed at compile-time

* Set at runtime, but fixed size, once allocated

* Set at runtime, and the size can change

Fixed ones including declarations like this:

[100]int A
[50]int B

Here, either A or B can be passed to a function taking a slice[]int
parameter.

Fixed ones also include short strings forming their own type; I mainly
used these for struct member types.

A callee which expected a read-only argument to be an array of char
could probably accept either type of string and it might be clearer how
a function was going to use a string for it to be declared as array of
char rather than a string of char.

Many options!

Yeah, too many.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Apr 25 22:17:10 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 10:09:36 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 08:24:20 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 06:40:30 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	73:22:53
Calls:	6,657
Calls today:	3
Files:	12,203
Messages:	5,332,384
Posted today:	1

Handling small tuples

Who's Online

Recent Visitors

System Info