• String Buffer

    From Kevin Chadwick@21:1/5 to All on Thu Dec 2 10:17:37 2021
    In this thread bounded and unbounded get quite a bashing.

    "https://groups.google.com/g/comp.lang.ada/c/NINmFln-YS4/m/5De5DeUAAAAJ"

    I thought bounded looked useful but then I realised that it allocates the max immediately anyway. It may be useful in constrained environments but then I do not use Strings in constrained environments.

    Unbounded is said to be inefficient because it re-allocates.

    In Go they have strings.Builder. I assume that is what Text_Buffer is aimed to be. (Actually Go seems to have copied a lot from Ada such as AWS API, unless they both are similar to something else like JAVA).

    Is Text_Buffer usable today with GCC 11?

    strings.Builder in Go behaves similarly to unbounded in that it doubles the allocation as required but it only returns a string when needed and does not have string operations. You can Grow the builder to avoid re-allocations.

    "https://pkg.go.dev/strings#Builder"

    If possible without breaking all of the string functions (length and separate capacity) and Unbounded Strings had a Grow function, then wouldn't that relieve the efficiency issue?

    In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation most of the alleged 10% of the time that it useful, but it would be nice to know of and use something akin to strings.Builder, preferably from the standard
    library, if it is available?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeffrey R.Carter@21:1/5 to Kevin Chadwick on Thu Dec 2 20:56:31 2021
    On 2021-12-02 19:17, Kevin Chadwick wrote:

    Unbounded is said to be inefficient because it re-allocates.

    In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation
    "Efficiency" is only meaningful in the context of a project's quantitative timing requirements: an efficient implementation allows the project to meet those requirements, while an inefficient implementation does not.

    In the absence of such context, any claims of "inefficiency" (and especially blanket claims such as "Unbounded_String is inefficient") simply demonstrate the
    speaker's incompetence.

    For an example of context, a decade ago I worked on a soft-real-time system that
    made extensive use of Unbounded_String and had no problem meeting its timing requirements. Unbounded_String was clearly efficient for that project. Since most projects that would use Unbounded_String have even less restrictive timing requirements than that system, it seems likely that Unbounded_String will be efficient for them, too.

    --
    Jeff Carter
    "After fifteen minutes I wanted to marry her, and
    after a half hour I completely gave up the idea of
    snatching her purse."
    Take the Money and Run
    136

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Jeffrey R.Carter on Thu Dec 2 21:15:11 2021
    On 2021-12-02 20:56, Jeffrey R.Carter wrote:
    On 2021-12-02 19:17, Kevin Chadwick wrote:

    Unbounded is said to be inefficient because it re-allocates.

    In any case avoiding unbounded strings is almost certainly in the
    realm of premature optimisation

    I see it as a design question. Code that modifies a string as a whole is
    most likely broken. Unbounded_String property of varying length is not
    needed except for very special cases, like passing parameters where no
    result is allowed, e.g. in the task entries. Most, if not all such cases
    are Ada language design deficiencies. Normally there should be no need
    for Unbounded_String.

    In the absence of such context, any claims of "inefficiency" (and
    especially blanket claims such as "Unbounded_String is inefficient")
    simply demonstrate the speaker's incompetence.

    It is true only to a certain degree. When comparing algorithms it is
    valid to claim inefficiency without any context if computational
    complexity sufficiently differs.

    For example, an O(N) algorithm is unquestionably inefficient comparing
    to O(log N) one. I leave marginal cases of very small N.

    Unbounded_String uses storage pool and that is a qualitative difference
    that requires in any context. For very large text buffers they are
    unusable either. Again, it is a design question, any time I see Unbounded_String alarm bells start ringing.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Wright@21:1/5 to Kevin Chadwick on Thu Dec 2 20:51:30 2021
    Kevin Chadwick <kevc3no4@gmail.com> writes:

    In Go they have strings.Builder. I assume that is what Text_Buffer is
    aimed to be. (Actually Go seems to have copied a lot from Ada such as
    AWS API, unless they both are similar to something else like JAVA).

    Is Text_Buffer usable today with GCC 11?

    If you mean Universal Text Buffers, no. There is
    Ada.Strings.Text_Output, but it looks as though that's actually an
    internal package to support T'Put_Image - so probably best avoided.

    GCC 12 has universal text buffers.

    But I don't see that Ada.Strings.Text_Buffers.Unbounded is going to be
    any more or less efficient than Unbounded_Strings?

    I seem to remember that AdaCore made considerable performance
    improvements to Unbounded_Strings, for example copy-on-write sharing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeffrey R.Carter@21:1/5 to Dmitry A. Kazakov on Thu Dec 2 22:06:09 2021
    On 2021-12-02 21:15, Dmitry A. Kazakov wrote:

    It is true only to a certain degree. When comparing algorithms it is valid to claim inefficiency without any context if computational complexity sufficiently
    differs.

    For example, an O(N) algorithm is unquestionably inefficient comparing to O(log
    N) one. I leave marginal cases of very small N.

    This is false. If you have the O(N) algorithm and it meets your requirements, then using it is more efficient than implementing the O(log N) algorithm. If you
    have both and the O(N) algorithm meets your requirements, the effort to get your
    data into a form where you can apply the O(log N) algorithm may still outweigh the time difference.

    As for "very small N", I have worked on systems where linear search was efficient for sequences of 100,000 items.

    The only thing you can meaningfully say about such algorithms is that one is O(N) and the other O(log N). Both have their uses.

    Unbounded_String is needed far less than many people think, but there are application domains where it is much easier to achieve correctness and clarity with a variable-length string abstraction than without. Blanket statements about
    "efficiency" are dangerous for those working in such domains.

    --
    Jeff Carter
    "After fifteen minutes I wanted to marry her, and
    after a half hour I completely gave up the idea of
    snatching her purse."
    Take the Money and Run
    136

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Jeffrey R.Carter on Thu Dec 2 22:45:32 2021
    On 2021-12-02 22:06, Jeffrey R.Carter wrote:
    On 2021-12-02 21:15, Dmitry A. Kazakov wrote:

    It is true only to a certain degree. When comparing algorithms it is
    valid to claim inefficiency without any context if computational
    complexity sufficiently differs.

    For example, an O(N) algorithm is unquestionably inefficient comparing
    to O(log N) one. I leave marginal cases of very small N.

    This is false. If you have the O(N) algorithm and it meets your
    requirements, then using it is more efficient than implementing the
    O(log N) algorithm.

    You have to show that it meets these requirements. The burden of proof
    is on you as you select an objectively inferior algorithm. Selecting a
    better algorithm is a safe choice under ill-defined conditions.

    Unbounded_String is needed far less than many people think, but there
    are application domains where it is much easier to achieve correctness
    and clarity with a variable-length string abstraction than without.
    Blanket statements about "efficiency" are dangerous for those working in
    such domains.

    It is a good argument because introducing String requires more initial
    efforts in Ada than a thoughtless application Unbounded_String.

    And note, that in most cases it is really thoughtless as the choice is
    made on the basis of how easy it is to declare a string component of a
    record type and then rewrite it.

    Later on throughout the rest of the program the user of Unbounded_String
    will be consistently punished for that poor choice because normal string operations are very uncomfortable with Unbounded_String. But that
    happens later. Right now and here, let us save a couple of code lines.

    So the simplest and most persuasive blanket statement is OK to dissuade
    people from poor choices.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kevin Chadwick@21:1/5 to All on Thu Dec 2 16:49:24 2021
    And note, that in most cases it is really thoughtless as the choice is
    made on the basis of how easy it is to declare a string component of a record type and then rewrite it.

    Later on throughout the rest of the program the user of Unbounded_String will be consistently punished for that poor choice because normal string operations are very uncomfortable with Unbounded_String. But that
    happens later. Right now and here, let us save a couple of code lines.

    So the simplest and most persuasive blanket statement is OK to dissuade people from poor choices.

    I think I am glad that I am understanding this point about preferring strings early on in my Ada usage. Of course it is very easy to convert to a String as needed and whilst Randy mentioned yuck on "use" use, which I never use. I find package renames
    work well. I was thinking maybe you should never propagate an unbounded but then I am sure there will be the occasional scenario, where you expect the caller most likely wants to append and the code reads better without declares. As always...it depends..
    I guess?

    With regard to a String buffer. It just occurred to me that the reason bounded is more efficient than unbounded is because, you could consider the bound max as a one time equivalent to strings.Builders grow(). In Go, I would try to only call grow, once
    anyway.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Randy Brukardt@21:1/5 to Jeffrey R.Carter on Thu Dec 2 23:25:52 2021
    "Jeffrey R.Carter" <spam.jrcarter.not@spam.acm.org.not> wrote in message news:sobcg4$tc4$1@dont-email.me...
    On 2021-12-02 21:15, Dmitry A. Kazakov wrote:

    It is true only to a certain degree. When comparing algorithms it is
    valid to claim inefficiency without any context if computational
    complexity sufficiently differs.

    For example, an O(N) algorithm is unquestionably inefficient comparing to
    O(log N) one. I leave marginal cases of very small N.

    This is false. If you have the O(N) algorithm and it meets your
    requirements, then using it is more efficient than implementing the O(log
    N) algorithm. If you have both and the O(N) algorithm meets your requirements, the effort to get your data into a form where you can apply
    the O(log N) algorithm may still outweigh the time difference.

    As for "very small N", I have worked on systems where linear search was efficient for sequences of 100,000 items.

    And it isn't unusual that you *are* working mostly with small N. Case in
    point: The Janus/Ada parser table lookup routine. At one point, I tried to speed it up by implementing a binary search algorithm. (The table is
    logically a 2-dimensional array with the action being the element, but the majority of the elements are error markers [which have no information]. The usual storage is to omit those and store pairs of terminal/action. That's
    many times smaller for a language like Ada.) The binary search seemed to
    *slow down* parsing! I spent a week building instrumented versions, and determined that the average number of pairs was about 8, and the binary
    search (which necessarily was much more complex than a linear search) was slower until the number of pairs was about 20. So it only helped the largest states.

    I ended up doing two other things instead: coding the lookup routine in assembler, which got the lookup loop down to about 5 instructions (it might
    be possible for a compiler to write that loop with fancy optimizations, but I've never seen one that could, and Janus/Ada certainly isn't able), and
    then I restructured the parse table to optimize lookups (terminals don't
    appear with a uniform frequency in code). The net result was that average lookup takes 1.6 iterations (of a 5 instruction loop); the binary version (which executed a lot more code) took somewhere around 3.5 on average.

    The point is, simpler can often be better (especially as it is more likely
    to work the first time). And if it proves necessary to improve it, nothing really substitutes for instrumenting the actual application.

    Randy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vadim Godunko@21:1/5 to kevc...@gmail.com on Fri Dec 3 00:11:56 2021
    On Thursday, December 2, 2021 at 9:17:38 PM UTC+3, kevc...@gmail.com wrote:
    In this thread bounded and unbounded get quite a bashing.

    "https://groups.google.com/g/comp.lang.ada/c/NINmFln-YS4/m/5De5DeUAAAAJ"

    I thought bounded looked useful but then I realised that it allocates the max immediately anyway. It may be useful in constrained environments but then I do not use Strings in constrained environments.

    Unbounded is said to be inefficient because it re-allocates.

    In Go they have strings.Builder. I assume that is what Text_Buffer is aimed to be. (Actually Go seems to have copied a lot from Ada such as AWS API, unless they both are similar to something else like JAVA).

    Is Text_Buffer usable today with GCC 11?

    strings.Builder in Go behaves similarly to unbounded in that it doubles the allocation as required but it only returns a string when needed and does not have string operations. You can Grow the builder to avoid re-allocations.

    "https://pkg.go.dev/strings#Builder"

    If possible without breaking all of the string functions (length and separate capacity) and Unbounded Strings had a Grow function, then wouldn't that relieve the efficiency issue?

    In any case avoiding unbounded strings is almost certainly in the realm of premature optimisation most of the alleged 10% of the time that it useful, but it would be nice to know of and use something akin to strings.Builder, preferably from the
    standard library, if it is available?

    For VSS.Strings.Virtual_String we used two kinds of optimization:

    1. Short strings are stored without any memory allocation. It saves a lot of time. This is not very visible in multithread applications due to runtime cost of controlled objects; however it is very visible on manycore due to less amount of involved
    atomic operations. How "short" string should be depends from underlying encoding, content and machine architecture, on modern 64bit systems when UTF-8 encoding is used it is 17 ASCII characters, or 8 Cyrillic characters, or 4 math characters. In context
    of Ada Language Server most of cases are such small strings.

    2. It is possible to set capacity for the particular string object. Memory will be reallocated on next modification operation of the string object. This may be useful for large strings, when approximate size of the string is known and allows to save few
    allocate/move memory cycles on append. I don't know any real use cases of this feature right now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From ldries46@21:1/5 to All on Fri Dec 3 09:31:08 2021
    Op 2-12-2021 om 20:56 schreef Jeffrey R.Carter:
    On 2021-12-02 19:17, Kevin Chadwick wrote:

    Unbounded is said to be inefficient because it re-allocates.

    In any case avoiding unbounded strings is almost certainly in the
    realm of premature optimisation
    "Efficiency" is only meaningful in the context of a project's
    quantitative timing requirements: an efficient implementation allows
    the project to meet those requirements, while an inefficient
    implementation does not.

    In the absence of such context, any claims of "inefficiency" (and
    especially blanket claims such as "Unbounded_String is inefficient")
    simply demonstrate the speaker's incompetence.

    For an example of context, a decade ago I worked on a soft-real-time
    system that made extensive use of Unbounded_String and had no problem
    meeting its timing requirements. Unbounded_String was clearly
    efficient for that project. Since most projects that would use Unbounded_String have even less restrictive timing requirements than
    that system, it seems likely that Unbounded_String will be efficient
    for them, too.

    I don't like the word "Efficiency". because what doe it mean. In the
    time that I started programming 1966/1967 computers were relative slow
    and small. You did not like any kind of strings because they toke lots
    of memory and lots of power. Gradually computers started to become
    faster and bigger (in memory, smaller in size) that meant that string operations also became faster and so use of string operations became
    more normal. In my opinion programs that need a lot of string
    manipulation do not need to be very fast because in general there the
    output is the limiting factor. In most other programs string
    manipulation is only a small part of the program still becoming faster
    along with new developments. My statement is that string manipulation
    does no have to become more efficient because other factors do limit the efficciency of the program much more.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)