• mapLast, mapFirst, and just general iterator questions

    From Travis Griggs@21:1/5 to All on Tue Jun 14 11:05:52 2022
    I want to be able to apply different transformations to the first and last elements of an arbitrary sized finite iterator in python3. It's a custom iterator so does not have _reversed_. If the first and last elements are the same (e.g. size 1), it should
    apply both transforms to the same element. I'm doing this because I have an iterator of time span tuples, and I want to clamp the first and last elements, but know any/all of the middle values are inherently in range.

    A silly example might be a process that given an iterator of strings, chops the the outer characters off of the value, and uppercases the final value. For example:


    def iterEmpty():
    return iter([])

    def iter1():
    yield "howdy"

    def iter2():
    yield "howdy"
    yield "byebye"

    def iterMany():
    yield "howdy"
    yield "hope"
    yield "your"
    yield "day"
    yield "is"
    yield "swell"
    yield "byebye"

    def mapFirst(stream, transform):
    try:
    first = next(stream)
    except StopIteration:
    return
    yield transform(first)
    yield from stream

    def mapLast(stream, transform):
    try:
    previous = next(stream)
    except StopIteration:
    return
    for item in stream:
    yield previous
    previous = item
    yield transform(previous)

    def main():
    for each in (iterEmpty, iter1, iter2, iterMany):
    baseIterator = each()
    chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
    andCapLast = mapLast(chopFirst, lambda x: x.upper())
    print(repr(" ".join(andCapLast)))


    This outputs:

    ''
    'OWD'
    'owd BYEBYE'
    'owd hope your day is swell BYEBYE'

    Is this idiomatic? Especially my implementations of mapFirst and mapList there in the middle? Or is there some way to pull this off that is more elegant?

    I've been doing more with iterators and stacking them (probably because I've been playing with Elixir elsewhere), I am generally curious what the performance tradeoffs of heavy use of iterators and yield functions in python is. I know the argument for
    avoiding big list copies when moving between stages. Is it one of those things where there's also some overhead with them, where for small stuff, you'd just be better list-ifying the first iterator and then working with lists (where, for example, I could
    do the first/last clamp operation with just indexing operations).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to Travis Griggs on Wed Jun 15 04:47:31 2022
    On Wed, 15 Jun 2022 at 04:07, Travis Griggs <travisgriggs@gmail.com> wrote:
    def mapFirst(stream, transform):
    try:
    first = next(stream)
    except StopIteration:
    return
    yield transform(first)
    yield from stream

    Small suggestion: Begin with this:

    stream = iter(stream)

    That way, you don't need to worry about whether you're given an
    iterator or some other iterable (for instance, you can't call next()
    on a list, but it would make good sense to be able to use your
    function on a list).

    (BTW, Python's convention would be to call this "map_first" rather
    than "mapFirst". But that's up to you.)

    def mapLast(stream, transform):
    try:
    previous = next(stream)
    except StopIteration:
    return
    for item in stream:
    yield previous
    previous = item
    yield transform(previous)

    Hmm. This might be a place to use multiple assignment, but what you
    have is probably fine too.

    def main():
    for each in (iterEmpty, iter1, iter2, iterMany):
    baseIterator = each()
    chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
    andCapLast = mapLast(chopFirst, lambda x: x.upper())
    print(repr(" ".join(andCapLast)))

    Don't bother with a main() function unless you actually need to be
    able to use it as a function. Most of the time, it's simplest to just
    have the code you want, right there in the file. :) Python isn't C or
    Java, and code doesn't have to get wrapped up in functions in order to
    exist.

    Is this idiomatic? Especially my implementations of mapFirst and mapList there in the middle? Or is there some way to pull this off that is more elegant?


    Broadly so. Even with the comments I've made above, I wouldn't say
    there's anything particularly *wrong* with your code. There are, of
    course, many ways to do things, and what's "best" depends on what your
    code is doing, whether it makes sense in context.

    I've been doing more with iterators and stacking them (probably because I've been playing with Elixir elsewhere), I am generally curious what the performance tradeoffs of heavy use of iterators and yield functions in python is. I know the argument for
    avoiding big list copies when moving between stages. Is it one of those things where there's also some overhead with them, where for small stuff, you'd just be better list-ifying the first iterator and then working with lists (where, for example, I could
    do the first/last clamp operation with just indexing operations).


    That's mostly right, but more importantly: Don't worry about
    performance. Worry instead about whether the code is expressing your
    intent. If that means using a list instead of an iterator, go for it!
    If that means using an iterator instead of a list, go for it! Python
    won't judge you. :)

    But if you really want to know which one is faster, figure out a
    reasonable benchmark, and then start playing around with the timeit
    module. Just remember, it's very very easy to spend hours trying to
    make the benchmark numbers look better, only to discover that it has
    negligible impact on your code's actual performance - or, in some
    cases, it's *worse* than before (because the benchmark wasn't truly representative). So if you want to spend some enjoyable time exploring different options, go for it! And we'd be happy to help out. Just
    don't force yourself to write bad code "because it's faster".

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to Roel Schroeven on Wed Jun 15 05:49:26 2022
    On Wed, 15 Jun 2022 at 05:45, Roel Schroeven <roel@roelschroeven.net> wrote:

    Chris Angelico schreef op 14/06/2022 om 20:47:
    def main():
    for each in (iterEmpty, iter1, iter2, iterMany):
    baseIterator = each()
    chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
    andCapLast = mapLast(chopFirst, lambda x: x.upper())
    print(repr(" ".join(andCapLast)))

    Don't bother with a main() function unless you actually need to be
    able to use it as a function. Most of the time, it's simplest to just
    have the code you want, right there in the file. :) Python isn't C or
    Java, and code doesn't have to get wrapped up in functions in order to exist.
    Not (necessarily) a main function, but these days the general
    recommendation seems to be to use the "if __name__ == '__main__':"
    construct, so that the file can be used as a module as well as as a
    script. Even for short simple things that can be helpful when doing
    things like running tests or extracting docstrings.

    If it does need to be used as a module as well as a script, sure. But
    (a) not everything does, and (b) even then, you don't need a main()
    function; what you need is the name-is-main check. The main function
    is only necessary when you need to be able to invoke your main entry
    point externally, AND this main entry point doesn't have a better
    name. That's fairly rare in my experience.

    My recommendation is to write the code you need, and only add
    boilerplate when you actually need it. Don't just start every script
    with an if-name-is-main block at the bottom just for the sake of doing
    it.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roel Schroeven@21:1/5 to Chris Angelico on Tue Jun 14 21:44:26 2022
    Chris Angelico schreef op 14/06/2022 om 20:47:
    def main():
    for each in (iterEmpty, iter1, iter2, iterMany):
    baseIterator = each()
    chopFirst = mapFirst(baseIterator, lambda x: x[1:-1])
    andCapLast = mapLast(chopFirst, lambda x: x.upper())
    print(repr(" ".join(andCapLast)))

    Don't bother with a main() function unless you actually need to be
    able to use it as a function. Most of the time, it's simplest to just
    have the code you want, right there in the file. :) Python isn't C or
    Java, and code doesn't have to get wrapped up in functions in order to
    exist.
    Not (necessarily) a main function, but these days the general
    recommendation seems to be to use the "if __name__ == '__main__':"
    construct, so that the file can be used as a module as well as as a
    script. Even for short simple things that can be helpful when doing
    things like running tests or extracting docstrings.

    --
    "This planet has - or rather had - a problem, which was this: most of the people living on it were unhappy for pretty much of the time. Many solutions were suggested for this problem, but most of these were largely concerned with the movement of small green pieces of paper, which was odd because on the whole it wasn't the small green pieces of paper that were unhappy."
    -- Douglas Adams

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Ewing@21:1/5 to Chris Angelico on Wed Jun 15 11:45:32 2022
    On 15/06/22 7:49 am, Chris Angelico wrote:
    If it does need to be used as a module as well as a script, sure. But
    (a) not everything does, and (b) even then, you don't need a main()

    I think this is very much a matter of taste. Personally I find it tidier
    to put the top level code in a function, because it ties it together
    visually and lets me have locals that are properly local.

    If the file is only ever used as a script, I just put an unconditional
    call to the main function at the bottom.

    --
    Greg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Cameron Simpson@21:1/5 to Chris Angelico on Wed Jun 15 12:05:37 2022
    On 15Jun2022 05:49, Chris Angelico <rosuav@gmail.com> wrote:
    On Wed, 15 Jun 2022 at 05:45, Roel Schroeven <roel@roelschroeven.net> wrote: >> Not (necessarily) a main function, but these days the general
    recommendation seems to be to use the "if __name__ == '__main__':"
    construct, so that the file can be used as a module as well as as a
    script. Even for short simple things that can be helpful when doing
    things like running tests or extracting docstrings.

    If it does need to be used as a module as well as a script, sure. But
    (a) not everything does, and (b) even then, you don't need a main()
    function; what you need is the name-is-main check. The main function
    is only necessary when you need to be able to invoke your main entry
    point externally, AND this main entry point doesn't have a better
    name. That's fairly rare in my experience.

    While I will lazily not-use-a-function in dev, using a function has the
    benefit of avoiding accidental global variable use, because assignments
    within the function will always make local variables. That is a big plus
    for me all on its own. I've used this practice as far back as Pascal,
    which also let you write outside-a-function code, and consider it a
    great avoider of a common potential bug situation.

    Cheers,
    Cameron Simpson <cs@cskk.id.au>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Leo@21:1/5 to Chris Angelico on Sun Jun 19 12:08:09 2022
    On Wed, 15 Jun 2022 04:47:31 +1000, Chris Angelico wrote:

    Don't bother with a main() function unless you actually need to be
    able to use it as a function. Most of the time, it's simplest to
    just have the code you want, right there in the file. :) Python
    isn't C or Java, and code doesn't have to get wrapped up in
    functions in order to exist.

    Actually a main() function in Python is pretty useful, because Python
    code on the top level executes a lot slower. I believe this is due to
    global variable lookups instead of local.

    Here is benchmark output from a small test.

    ```
    Benchmark 1: python3 test1.py
    Time (mean ± σ): 662.0 ms ± 44.7 ms
    Range (min … max): 569.4 ms … 754.1 ms

    Benchmark 2: python3 test2.py
    Time (mean ± σ): 432.1 ms ± 14.4 ms
    Range (min … max): 411.4 ms … 455.1 ms

    Summary
    'python3 test2.py' ran
    1.53 ± 0.12 times faster than 'python3 test1.py'
    ```

    Contents of test1.py:

    ```
    l1 = list(range(5_000_000))
    l2 = []

    while l1:
    l2.append(l1.pop())

    print(len(l1), len(l2))
    ```

    Contents of test2.py:

    ```
    def main():
    l1 = list(range(5_000_000))
    l2 = []

    while l1:
    l2.append(l1.pop())

    print(len(l1), len(l2))
    main()
    ```

    --
    Leo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Leo on Sun Jun 19 13:24:04 2022
    Leo <usenet@gkbrk.com> writes:
    Actually a main() function in Python is pretty useful, because Python
    code on the top level executes a lot slower. I believe this is due to
    global variable lookups instead of local.

    Whether this runs faster or slower might also depend on the
    Python implementation used.

    I have found that I need something like "main" often sooner
    or later, perhaps because I want to write a little "wrapper"
    around it, or for "tracer.run( 'main()' )". So I don't see it
    as wrong to write it like that right from the beginning.
    But, it's also not bad not to do this, because you can still
    change it to a function definition later with ease if needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Angelico@21:1/5 to Leo on Tue Jun 21 07:01:59 2022
    On Tue, 21 Jun 2022 at 06:16, Leo <usenet@gkbrk.com> wrote:

    On Wed, 15 Jun 2022 04:47:31 +1000, Chris Angelico wrote:

    Don't bother with a main() function unless you actually need to be
    able to use it as a function. Most of the time, it's simplest to
    just have the code you want, right there in the file. :) Python
    isn't C or Java, and code doesn't have to get wrapped up in
    functions in order to exist.

    Actually a main() function in Python is pretty useful, because Python
    code on the top level executes a lot slower. I believe this is due to
    global variable lookups instead of local.

    Here is benchmark output from a small test.

    ```
    Benchmark 1: python3 test1.py
    Time (mean ± σ): 662.0 ms ± 44.7 ms
    Range (min … max): 569.4 ms … 754.1 ms

    Benchmark 2: python3 test2.py
    Time (mean ± σ): 432.1 ms ± 14.4 ms
    Range (min … max): 411.4 ms … 455.1 ms

    Summary
    'python3 test2.py' ran
    1.53 ± 0.12 times faster than 'python3 test1.py'
    ```

    Contents of test1.py:

    ```
    l1 = list(range(5_000_000))
    l2 = []

    while l1:
    l2.append(l1.pop())

    print(len(l1), len(l2))
    ```

    Contents of test2.py:

    ```
    def main():
    l1 = list(range(5_000_000))
    l2 = []

    while l1:
    l2.append(l1.pop())

    print(len(l1), len(l2))
    main()
    ```


    To be quite honest, I have never once in my life had a time when the
    execution time of a script is dominated by global variable lookups in
    what would be the main function, AND it takes long enough to care
    about it. Yes, technically it might be faster, but I've probably spent
    more time reading your post than I'll ever save by putting stuff into
    a function :)

    Also, often at least some of those *need* to be global in order to be
    useful, so you'd lose any advantage you gain.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)