• Handling small tuples

    From James Harris@21:1/5 to All on Sat Sep 4 16:56:14 2021
    This query was prompted by issues raised in the discussion about an
    object having a pointer or a tag.

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

    (address, type) for an object
    (address, method table ptr) for an object
    (address, length) for a string
    (address, length) for a subarray or 'slice' of an array

    Each of those is a 2-tuple and I think the key requirement is that the
    compiler would need to ensure that both parts are kept together.

    For example, if s is a string then

    f(s)

    would pass the string's address and length - probably as two separate
    arguments even though only one appears in the argument list, something
    which I believe could be called an 'unboxed tuple'. Further, if # means
    to allow modification then

    f(#s)

    would pass the string's address and length in such a way that either or
    both could be modified. Similarly,

    return s

    would return address and length (which the caller would accept as a unit).

    Both caller and callee would see s as a single object unless they chose
    to break down the tuple with such as

    s._ptr
    s._len


    The question is, Would that kind of semantics be helpful or would it
    cause more problems than it solves?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to James Harris on Sat Sep 4 17:05:27 2021
    On 04/09/2021 16:56, James Harris wrote:

    ...

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

      (address, type)               for an object
      (address, method table ptr)   for an object
      (address, length)             for a string
      (address, length)             for a subarray or 'slice' of an array

    Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.

    ...

    The question is, Would that kind of semantics be helpful or would it
    cause more problems than it solves?

    I am not sure whether the advantages of passing tuples could be realised
    in a better way. Two alternative options spring to mind:

    1. Store the tuple (address, length) as a small control block and pass
    pointers to it, though that may run into trouble with

    return s

    if the caller has allocated a new string, s.

    2. Have sentient pointers which include address and length and a whole
    load of other info to do with types and handling memory; and have a
    mechanism which effectively passes such sentient pointers to and from
    routines.

    AISI sentient pointers would be useful in all sorts of situation
    including, I think, objects identified by multiple pieces of
    information. But I wonder if tuple handling, while it would be lighter
    weight, would add anything significant.

    Thoughts? Opinions?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Sep 4 18:04:20 2021
    On 04/09/2021 16:56, James Harris wrote:
    This query was prompted by issues raised in the discussion about an
    object having a pointer or a tag.

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

      (address, type)               for an object
      (address, method table ptr)   for an object
      (address, length)             for a string
      (address, length)             for a subarray or 'slice' of an array

    Each of those is a 2-tuple and I think the key requirement is that the compiler would need to ensure that both parts are kept together.

    For example, if s is a string then

      f(s)

    would pass the string's address and length - probably as two separate arguments even though only one appears in the argument list, something
    which I believe could be called an 'unboxed tuple'. Further, if # means
    to allow modification then

      f(#s)

    would pass the string's address and length in such a way that either or
    both could be modified. Similarly,

      return s

    would return address and length (which the caller would accept as a unit).

    Both caller and callee would see s as a single object unless they chose
    to break down the tuple with such as

      s._ptr
      s._len


    Those string examples pretty much match Slices in my language. It's a
    feature I haven't really used, so it's partly fallen into disuse, but
    the following example works (s as a 'char* instead of char[] doesn't work):

    proc testfn(slice[]char s)=
    println s, s.len, ref void(s.sliceptr)
    end

    proc start=
    static []char s=z"one two three"

    testfn("hello")
    testfn(s)
    testfn(s[5..7])
    end

    The first two calls to testfn automatically turn the array (which has a
    known size, probably why char* didn't work) into a slice object.

    While the s[5..7] expression in the last call create a slice anyway. The
    output is:

    hello 5 00408A56
    one two three 17 00408150
    two 3 00408154

    A slice is a 16-byte descriptor consisting of a 64-bit pointer, and a
    64-bit length. (Perhap half the size on a 32-bit system.)

    s.len extracts that length, and s.sliceptr extracts the pointer, just
    like your example.

    Slices are 128-bit types which in my implementation are passed by value,
    in 2 registers if necessary (the ABI says to pass and return by pointer,
    but that's messy, so no thanks).

    Slices can also be constructed from a pair of values:

    testfn(("ABCDEFHI",4)) # displays ABCD 4

    Note that slices here are not specific to strings; they are general
    slices of any kind of array. But since the element type here is 'char',
    it knows to print this slice as a string (but as a counted string, as
    there is no terminator).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sat Sep 4 19:40:37 2021
    On 2021-09-04 17:56, James Harris wrote:
    This query was prompted by issues raised in the discussion about an
    object having a pointer or a tag.

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

      (address, type)               for an object
      (address, method table ptr)   for an object
      (address, length)             for a string
      (address, length)             for a subarray or 'slice' of an array

    Why address, why single constraint? By-value passing is as useful as by-reference. The general form is:

    (constraint-1, constraint-2, ..., constraint-N, reference)
    (constraint-1, constraint-2, ..., constraint-N, value)

    Types of constraints you listed exist in Ada:

    type = type tag
    ptr = access discriminant
    length = bounds

    But constraints could be any. For example a variant record:

    type Variant (Kind_Of : Integer) is record
    case Kind_Of is
    when 0 =>
    null;
    when 1 =>
    I : Integer;
    when others =>
    X : Float;
    end case;
    end record;

    Or consider a dimensioned value. The constraint would be its measurement
    unit, the value would be the nominal value, and you would like pass it
    rather by-value.

    The question is, Would that kind of semantics be helpful or would it
    cause more problems than it solves?

    This is the most important mechanism used in all forms of polymorphism.

    Regarding keeping parts together, you must always be ready to strip the constraint(s) or add them.

    In general terms, let you pass an indefinite object to f() and then call
    g() with a definite constraint. f() expects the argument with its
    constraint. g() knows the constraint already.

    Example in Ada:

    subtype Punched_Card is String (1..80); -- Definite, 80 characters
    procedure G (X : Punched_Card); -- Accept only 80 characters

    procedure F (X : String) is -- Accept any string = indefinite
    begin
    G (X); -- Check the actual constraint, strip it, pass the rest
    end F;

    And conversely:

    Y : Punched_Card := (others => ' ');
    begin
    F (Y); -- Add the constraint 1..80, before passing Y to F

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sun Sep 5 12:28:26 2021
    On 04/09/2021 18:40, Dmitry A. Kazakov wrote:
    On 2021-09-04 17:56, James Harris wrote:
    This query was prompted by issues raised in the discussion about an
    object having a pointer or a tag.

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

       (address, type)               for an object
       (address, method table ptr)   for an object
       (address, length)             for a string
       (address, length)             for a subarray or 'slice' of an array

    Why address, why single constraint? By-value passing is as useful as by-reference. The general form is:

       (constraint-1, constraint-2, ..., constraint-N, reference)
       (constraint-1, constraint-2, ..., constraint-N, value)

    I don't see the elements I was talking about as constraints. To explain,
    one could consider the following expression

    p := q + 1

    as getting the addresses of both p and q and dereferencing q to obtain
    its value. It's normal for compiled code to work with an object's
    location. I was positing adding one or more other pieces of info.

    With a value model an identifier such as

    x

    indicates

    1. a type (classically, known at compile time)
    2. whether the location or the value is needed (known from context))
    3. the location of the object (known at run time)

    but for slices there also needs to be

    4. the length of the object (known by run time)

    It's normal for the runtime to know (3) the location. What I was asking
    was about the runtime also maintaining (4) the length. IOW instead of
    the runtime knowing an object's location it would know both of
    (location, length).

    ...

    I take your point about subtypes and length constraints (now snipped)
    but I was thinking more of having objects which would be identified by a
    single identifier, such as x, but which would have two or more meta
    attributes which are known and manipulated at run time.

    Consider

    function f
    (
    in String s0
    inout String s1
    returns String s2
    )

    where all three strings would be passed as (location, length). Function
    f could read the 'in' parameter s0, modify the inout parameter s1 and
    return the s2.

    One way to treat the strings as first-class objects (possibly with
    immutable lengths) could be by passing around the 2-tuple (location,
    length).


    Slices
    ======

    That arrangement should work well for zero-based array slices. In fact,
    with zero basing (i.e. arrays which are considered to be indexed from
    zero to len - 1) arrays and slices of arrays would be passed in the same
    way (location, length) and a callee would be able to process either as

    array of T

    The callee would know the type, T, as well as the location and the
    length of any parameter it was called with.


    Objects
    =======

    Consider the object tags we have been talking about in another thread.
    They could be stored not with the object but maintained separately by
    the compiler. An object would then be known to a compiler as the tuple (location, tag).

    That would allow us to treat objects in store as ADTs without having to
    change the storage layout such as prepending those objects with a tag -
    which would make some things when working at a lower level easier and
    faster.


    Alternatives
    ============

    While there are temptations, is passing tuples around a good idea?

    For example, it may be better to store the fields of the tuple as a
    control block and to pass pointers to those control blocks instead of
    the fields from the control block.

    Or it might be better to store (location, length) or (location, tag) etc
    as just of many fields in sentient pointers which include other
    information.

    Or, (one more!) it may be better for a programmer to pass the fields of
    each tuple explicitly.

    Bottom line for me at present is that I think this would work very well
    for zero-based arrays where the length could not be changed by a callee,
    and there are potential performance and compatibility benefits of not
    having to alter an object's layout (by adding a tag) but the jury is out
    for other uses.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sun Sep 5 14:10:59 2021
    On 2021-09-05 13:28, James Harris wrote:
    On 04/09/2021 18:40, Dmitry A. Kazakov wrote:
    On 2021-09-04 17:56, James Harris wrote:
    This query was prompted by issues raised in the discussion about an
    object having a pointer or a tag.

    There are some situations in programming where it could be useful to
    identify an object not by a single datum such as its address but by a
    small tuple which includes the address. For example,

       (address, type)               for an object
       (address, method table ptr)   for an object
       (address, length)             for a string
       (address, length)             for a subarray or 'slice' of an array

    Why address, why single constraint? By-value passing is as useful as
    by-reference. The general form is:

        (constraint-1, constraint-2, ..., constraint-N, reference)
        (constraint-1, constraint-2, ..., constraint-N, value)

    I don't see the elements I was talking about as constraints. To explain,
    one could consider the following expression

      p := q + 1

    as getting the addresses of both p and q and dereferencing q to obtain
    its value. It's normal for compiled code to work with an object's
    location. I was positing adding one or more other pieces of info.

    With a value model an identifier such as

      x

    indicates

    1. a type (classically, known at compile time)
    2. whether the location or the value is needed (known from context))
    3. the location of the object (known at run time)

    but for slices there also needs to be

    4. the length of the object (known by run time)

    No, that is just an indefinite string. Its value consists of the body
    and the actual constraint, the length in your case, bounds in more
    advanced languages.

    It's normal for the runtime to know (3) the location. What I was asking
    was about the runtime also maintaining (4) the length. IOW instead of
    the runtime knowing an object's location it would know both of
    (location, length).

    That is just a value which internal representation is a tuple.

    One way to treat the strings as first-class objects (possibly with
    immutable lengths) could be by passing around the 2-tuple (location,
    length).

    The indefinite strings.

    Definite strings have only bodies, the constraint (length) is known
    statically from the subtype.

    Slices

    Slice is just a string.

    It is about memory management, slice is a referential object that refers
    to some other string object and shall not outlive it.

    Objects
    =======

    Consider the object tags we have been talking about in another thread.
    They could be stored not with the object but maintained separately by
    the compiler. An object would then be known to a compiler as the tuple (location, tag).

    This is a class-wide object.

    That would allow us to treat objects in store as ADTs without having to change the storage layout such as prepending those objects with a tag -
    which would make some things when working at a lower level easier and
    faster.

    Embedded tags are used with relatively large objects passed by
    reference. Outbound tags are used with small scalar objects.

    To have the latter you must distinguish specific and class-wide types.
    The former allows conflating both types, as C++ does.

    For example, it may be better to store the fields of the tuple as a
    control block and to pass pointers to those control blocks instead of
    the fields from the control block.

    This is merely passing by reference rather than by value.

    Or it might be better to store (location, length) or (location, tag) etc
    as just of many fields in sentient pointers which include other
    information.

    No difference, just by reference passing. Unless you introduce a new
    type for the pointer which would an awful idea.

    Or, (one more!) it may be better for a programmer to pass the fields of
    each tuple explicitly.

    Leave that to Bart! (:-))

    Bottom line for me at present is that I think this would work very well
    for zero-based arrays where the length could not be changed by a callee,
    and there are potential performance and compatibility benefits of not
    having to alter an object's layout (by adding a tag) but the jury is out
    for other uses.

    All depends on two factors:

    1. the parameter passing method
    2. embedding the constraint

    A good language must allow all combinations of these.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Sep 5 12:31:49 2021
    On 04/09/2021 18:04, Bart wrote:
    On 04/09/2021 16:56, James Harris wrote:

    ...

    For example, if s is a string then

       f(s)

    would pass the string's address and length

    ...

    Both caller and callee would see s as a single object unless they
    chose to break down the tuple with such as

       s._ptr
       s._len


    Those string examples pretty much match Slices in my language.

    That makes sense. It was largely because of slices that the issue arose.
    I was looking into how to support potential copy-in, copy-out for arrays
    (or parts thereof) so a pointer to the first element (as in C) would not
    be enough. Length would also be required.

    And that's handy because as you showed in your example code the pair
    (location, length) also works well as a slice, and a slice can be easy
    for a programmer to append - e.g. with the [...] in

    s[5..7]

    So a slice would be a subarray which could also be considered to be an
    array in its own right.

    I should say that passing arrays as (location, length) and having the
    potential for copy-in, copy-out does not mean that arrays would always literally be copied. Such arrays or slices could often be passed by
    reference. However, the potential for copying would be there and the
    callee would additionally be informed of the array length.

    It's a
    feature I haven't really used, so it's partly fallen into disuse, but
    the following example works (s as a 'char* instead of char[] doesn't work):

        proc testfn(slice[]char s)=
            println s, s.len, ref void(s.sliceptr)
        end

        proc start=
            static []char s=z"one two three"

            testfn("hello")
            testfn(s)
            testfn(s[5..7])
        end

    Do you allow the callee to modify s.len or even s.sliceptr and can a
    callee /return/ a slice?

    I think you said somewhere that you return a 2-tuple in two registers
    but I cannot find that comment ATM. If slices can be returned should
    they be new objects or should they be restricted to referring to parts
    of existing objects?


    The first two calls to testfn automatically turn the array (which has a
    known size, probably why char* didn't work) into a slice object.

    I guess char* does not and can not know the length.

    Thinking out loud, as it were, are there logically at least two string
    types in common use in programming: one dynamic and of variable-length,
    the other an array of char where for some operations the array size is
    fixed at or by run time?

    A callee which expected a read-only argument to be an array of char
    could probably accept either type of string and it might be clearer how
    a function was going to use a string for it to be declared as array of
    char rather than a string of char.

    Many options!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sun Sep 5 14:53:28 2021
    On 05/09/2021 12:31, James Harris wrote:
    On 04/09/2021 18:04, Bart wrote:
    On 04/09/2021 16:56, James Harris wrote:

    ...

    For example, if s is a string then

       f(s)

    would pass the string's address and length

    ...

    Both caller and callee would see s as a single object unless they
    chose to break down the tuple with such as

       s._ptr
       s._len


    Those string examples pretty much match Slices in my language.

    That makes sense. It was largely because of slices that the issue arose.
    I was looking into how to support potential copy-in, copy-out for arrays
    (or parts thereof) so a pointer to the first element (as in C) would not
    be enough. Length would also be required.

    And that's handy because as you showed in your example code the pair (location, length) also works well as a slice, and a slice can be easy
    for a programmer to append - e.g. with the [...] in

      s[5..7]

    So a slice would be a subarray which could also be considered to be an
    array in its own right.

    I should say that passing arrays as (location, length) and having the potential for copy-in, copy-out does not mean that arrays would always literally be copied. Such arrays or slices could often be passed by reference. However, the potential for copying would be there and the
    callee would additionally be informed of the array length.

    I should say that a slice is an explcit user-declated type in my language.

    Regular arrays can be passed, nominally by value (it's normally done
    with pointers) but they are fixed size so caller and callee both know
    the length.

    The language doesn't have a built-in way of representing dynamic arrays;
    slices were one alternative to passing separate pointers and lengths.

    (The dynamic language handles those of course, but by fairly heavyweight descriptors.)


    It's a feature I haven't really used, so it's partly fallen into
    disuse, but the following example works (s as a 'char* instead of
    char[] doesn't work):

         proc testfn(slice[]char s)=
             println s, s.len, ref void(s.sliceptr)
         end

         proc start=
             static []char s=z"one two three"

             testfn("hello")
             testfn(s)
             testfn(s[5..7])
         end

    Do you allow the callee to modify s.len or even s.sliceptr and can a
    callee /return/ a slice?

    I don't think I allow that. But you can extract those components, modify
    them and construct a new slice. To change the caller's copy of slice, it
    needs to be a reference parameter to an actual slice object (not a
    constructed one).


    I think you said somewhere that you return a 2-tuple in two registers
    but I cannot find that comment ATM.

    They use the same handling as 128-bit numbers. A 64-bit value is
    returned in register D0 (rax), and 128-bit ones in registers D1:D0
    (don't know what D1 is in Intel terms, but it'll be one of the volatile
    ones).

    This program:

    function fn(string s,t)string u= # string defined below
    return u
    end

    proc start=
    string a,b,c
    c:=fn(a,b)
    end

    Generates this ASM (note that my register optimiser will only put or
    keep 64-bit variables in registers; 128-bit ones are a little too
    rarified for that):

    t.fn:
    fn.s = 16
    fn.t = 32
    fn.u = -16
    push Dframe
    mov Dframe, Dstack
    sub Dstack, 16
    mov [Dframe+16], D10
    mov [Dframe+24], D11
    mov [Dframe+32], D12
    mov [Dframe+40], D13 ;-------------------------------------------------
    mov D0, [Dframe+fn.u]
    mov D1, [Dframe+fn.u+8] ;-------------------------------------------------
    add Dstack, 16
    pop Dframe
    ret

    t.start:
    ....
    mov D10, [Dframe+start.a]
    mov D11, [Dframe+start.a+8]
    mov D12, [Dframe+start.b]
    mov D13, [Dframe+start.b+8]
    call t.fn
    mov [Dframe+start.c], D0
    mov [Dframe+start.c+8], D1
    ....


    If slices can be returned should
    they be new objects or should they be restricted to referring to parts
    of existing objects?

    Since slices are an explicit type of fixed size (16 bytes) they are
    passed and returned by value:

    type string = slice[]char # save some typing

    function widen(string s)string t=
    t:=(s.sliceptr, s.len+1)
    return t
    end

    proc start=
    string a:=("ABCDEFGH",3)

    println a
    println widen(a)
    end

    This displays ABC then ABCD. It's up to your code to ensure the string
    they point to still exists.

    I could have used:

    string a:="ABC"

    too (it will treat that as a:=("ABC", 3)) but my widening example
    wouldn't have worked too well! Here's another way of returning a slice:

    function left3(string s)string t=
    return s[1..3]
    end

    This could be called as left3("ABCDEFGHI"), and returns "ABC" as a
    slice. A bit too trivial though, as you can just do "ABCDEFGHI"[1..3].

    A returned slice value could refer to an existing string or slice or
    slice of slice etc, but it can refer to a newly allocated string.

    At level I implement this, user-code needs to keep track of that.

    (My 2-element slices can be used for allocated arrays that do not change
    in size. Arrays that grow are more challenging. It could be done by
    splitting the 64-bit length into a 32-bit length, and 32-capacity, but
    then it also needs language support to check accesses are within bounds,
    and to grow the arrays as needed.

    A bit like the C++ vector type, but that uses a 196-bit descriptor,
    which really needs handling by reference. But I decided that was a
    little beyond the remit of my static language.)


    The first two calls to testfn automatically turn the array (which has
    a known size, probably why char* didn't work) into a slice object.

    I guess char* does not and can not know the length.

    It could assume a terminated string (when there are separate char and u8 types), and call strlen to determine the length for the slice.


    Thinking out loud, as it were, are there logically at least two string
    types in common use in programming: one dynamic and of variable-length,
    the other an array of char where for some operations the array size is
    fixed at or by run time?

    Yes, there's three (at least):

    * Fixed at compile-time

    * Set at runtime, but fixed size, once allocated

    * Set at runtime, and the size can change

    Fixed ones including declarations like this:

    [100]int A
    [50]int B

    Here, either A or B can be passed to a function taking a slice[]int
    parameter.

    Fixed ones also include short strings forming their own type; I mainly
    used these for struct member types.

    A callee which expected a read-only argument to be an array of char
    could probably accept either type of string and it might be clearer how
    a function was going to use a string for it to be declared as array of
    char rather than a string of char.

    Many options!

    Yeah, too many.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)