• Re: Recommendations in terms of threading, multi-threading and/or async

    From Peter J. Holzer@21:1/5 to jacob kruger on Fri Jan 6 19:19:30 2023
    On 2023-01-06 10:18:24 +0200, jacob kruger wrote:
    I am just trying to make up my mind with regards to what I should look into working with/making use of in terms of what have put in subject line?


    As in, if want to be able to trigger multiple/various threads/processes to run in the background, possibly monitoring their states, either via interface, or via global variables, but, possibly while processing other forms of user interaction via the normal/main process, what would be recommended?

    This depends very much on what you want to do and what the constraints
    and requirements are and is completely impossible to answer in the
    abstract.

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmO4Zi0ACgkQ8g5IURL+ KF1Qcg/+N9OrgkZBFeITHJU/faGGB6mHTqLgd4RmOE8WaE9fJCXTgbXdLd2WLW3l yznnimddawLdQmj37XroqD/YM3wkEgyRSWZWqqF8XeOHq5sMvVosJPz9H1crmA/5 S+N5gx7IRSThbeftkHidPaKP5gEvZE7BBo526IDkT9CDRWU+KFfru2KsPtRxbNDJ D994msovnmtZw3swFFI6uapGuwynFb28hKgqLjGSGfO0tGvLPMAZ6vOW02ofxyYT HJ6jTY+3FjEl1NRiPcVS9Ch8X2v13xBm5nPjm2NwSWD0OC/5A5NYtnZBBuAIk0oX l6WGfq19QI8bYFdGZUbf6yebbm8iwyIpPsIchMpGm4SXG6OSKPrGuwHgQNiHqugy lkEtwIuI9u2VNUUAlRI4fwwORRZFUYhujcYFOVWbPm6Milg30Yhhygj0oiaCF/IF CdSgnvDidoOmYXRxKMJKl4QsUBfkwp/axdDqR5B5Nj6UfWVIwSecxugYB1Hrj4VC ervH8XkGLlK5M1OFMTpMFO5JN8+msGYAVmWtcXa9QIlPCl/nZ7JGo+Fj0E7/JoHV HIVPfraRrFk+2DA8I/eDS7xnU0EGVjxLPKW3NhvNQpGfakCEEBWmbbFdXzHDY5az Ylk+59QxqsIDbI27sA9A1u+g3ruFFT9tBrtciue
  • From Chris Angelico@21:1/5 to jacob kruger on Sat Jan 7 06:19:33 2023
    On Sat, 7 Jan 2023 at 04:54, jacob kruger <jacob.kruger.work@gmail.com> wrote:

    I am just trying to make up my mind with regards to what I should look
    into working with/making use of in terms of what have put in subject line?


    As in, if want to be able to trigger multiple/various threads/processes
    to run in the background, possibly monitoring their states, either via interface, or via global variables, but, possibly while processing other forms of user interaction via the normal/main process, what would be recommended?


    Any. All. Whatever suits your purpose.

    They all have different goals, different tradeoffs. Threads are great
    for I/O bound operations; they're easy to work with (especially in
    Python), behave pretty much like just having multiple things running concurrently, and generally are the easiest to use. But you'll run
    into limits as your thread count climbs (with a simple test, I started
    seeing delays at about 10,000 threads, with more serious problems at
    100,000), so it's not well-suited for huge scaling. Also, only one
    thread at a time can run Python code, which limits them to I/O-bound
    tasks like networking.

    Multiple processes take a lot more management. You have to carefully
    define your communication channels (for instance, a
    multiprocessing.Queue() to collect results), but they can do CPU-bound
    tasks in parallel. So multiprocessing is a good way to saturate all of
    your CPU cores. Big downsides include it being much harder to share
    information between the processes, and much MUCH higher resource usage
    than threads (with the same test as the above, I ran into limitations
    at just over 500 processes - way fewer than the 10,000 threads!).

    Asynchronous I/O runs a single thread in a single process. So like multithreading, it's only good for I/O bound tasks like networking.
    It's harder to work with, though, since you have to be very careful to
    include proper await points, and you can stall out the entire event
    loop with one mistake (common culprits being synchronous disk I/O, and gethostbyname). But the upside is that you get near-infinite tasks,
    basically just limited by available memory (or other resources).

    Use whichever one is right for your needs.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jacob kruger@21:1/5 to Chris Angelico on Sun Jan 8 13:49:38 2023
    Ok, the specific usage case right now is that I need to set up a process pulling contents of e-mail messages from an IMAP protocol mail server,
    which I then populate into a postgresql database, and, since this is the
    inbox of a relatively large-scale CRM/support system, there are
    currently over 2.5 million e-mails in the inbox, but, it can grow by
    over 50000 per day.


    I already have the basic process operating, using imap_tools, but,
    wanted to enable you to query the process during run-time, without
    needing to either check logs, or query the database itself while it is on-the-go - even if this is just for initial population time-period,
    since later on I will just set up code to run under a form of cron job,
    or handling time-based repeats itself on a separate machine.


    Also wanted to offer the ability to either pause, or terminate processes
    while it's busy batch processing large chunks of e-mail messages -
    either send a message to the thread, or set a global variable to tell it
    to end the run after the current process item has finished off, just in
    case.


    So, I think that for now, threading is probably the simplest to look into.


    Later on, was also considering forms of low-level monitoring for UI
    elements, but, this is not really related to initial task, but, could
    almost relate to forms of non-visual gaming interfaces, for blind/VI individuals - I am myself 100% blind, but, that's not really relevant in
    this context.


    Stay well


    Jacob Kruger
    +2782 413 4791
    "Resistance is futile...but, acceptance is versatile..."


    On 2023/01/06 21:19, Chris Angelico wrote:
    On Sat, 7 Jan 2023 at 04:54, jacob kruger <jacob.kruger.work@gmail.com> wrote:
    I am just trying to make up my mind with regards to what I should look
    into working with/making use of in terms of what have put in subject line? >>

    As in, if want to be able to trigger multiple/various threads/processes
    to run in the background, possibly monitoring their states, either via
    interface, or via global variables, but, possibly while processing other
    forms of user interaction via the normal/main process, what would be
    recommended?

    Any. All. Whatever suits your purpose.

    They all have different goals, different tradeoffs. Threads are great
    for I/O bound operations; they're easy to work with (especially in
    Python), behave pretty much like just having multiple things running concurrently, and generally are the easiest to use. But you'll run
    into limits as your thread count climbs (with a simple test, I started
    seeing delays at about 10,000 threads, with more serious problems at 100,000), so it's not well-suited for huge scaling. Also, only one
    thread at a time can run Python code, which limits them to I/O-bound
    tasks like networking.

    Multiple processes take a lot more management. You have to carefully
    define your communication channels (for instance, a
    multiprocessing.Queue() to collect results), but they can do CPU-bound
    tasks in parallel. So multiprocessing is a good way to saturate all of
    your CPU cores. Big downsides include it being much harder to share information between the processes, and much MUCH higher resource usage
    than threads (with the same test as the above, I ran into limitations
    at just over 500 processes - way fewer than the 10,000 threads!).

    Asynchronous I/O runs a single thread in a single process. So like multithreading, it's only good for I/O bound tasks like networking.
    It's harder to work with, though, since you have to be very careful to include proper await points, and you can stall out the entire event
    loop with one mistake (common culprits being synchronous disk I/O, and gethostbyname). But the upside is that you get near-infinite tasks,
    basically just limited by available memory (or other resources).

    Use whichever one is right for your needs.

    ChrisA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter J. Holzer@21:1/5 to jacob kruger on Sun Jan 8 18:11:40 2023
    On 2023-01-08 13:49:38 +0200, jacob kruger wrote:
    Ok, the specific usage case right now is that I need to set up a process pulling contents of e-mail messages from an IMAP protocol mail server, which I then populate into a postgresql database, and, since this is the inbox of
    a relatively large-scale CRM/support system, there are currently over 2.5 million e-mails in the inbox, but, it can grow by over 50000 per day.

    This is probably I/O-bound. You will likely spend much more time waiting
    for the IMAP server or the database than parsing the messages. So you
    probably don't need multi-processing just to utilize all your cores.
    On the other hand you have some nicely separated task which can be parallelized, so multi-threading should help (async probably would work
    just as well or as badly as multi-threading but I find that harder to understand so I would discard it at this point).

    I might be mistaken, though: Depending on how much processing you need
    to do on these messages it might be worth it split the work across
    multiple processes. Check the CPU-usage of your process: If it's close
    to 100% you will probably gain significantly from multi-processing.


    I already have the basic process operating, using imap_tools, but, wanted to enable you to query the process during run-time, without needing to either check logs, or query the database itself while it is on-the-go
    [...]
    Also wanted to offer the ability to either pause, or terminate processes while it's busy batch processing large chunks of e-mail messages

    So that would be an http (or other socket-based) interface? Should also
    be possible to add as an additional thread (or process).


    So, I think that for now, threading is probably the simplest to look into.

    I agree with that assessment.

    hp

    --
    _ | Peter J. Holzer | Story must make more sense than reality.
    |_|_) | |
    | | | hjp@hjp.at | -- Charles Stross, "Creative writing
    __/ | http://www.hjp.at/ | challenge!"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmO6+UcACgkQ8g5IURL+ KF0yhQ/+OL+r4FT9l48/rM8QpzgwMbdSEvyQpHcGeP92WFY/Z8dfCqbUoX+B6g8S 9hSzm2I2e1klyTlQh7/bxJITjLI4zSsNCN9ZjeskLLHlRrnPEPCFFG270JyfbUcM 0dYUbr0pA81kdAgT9krlv2y9YJU2chh0MPUWY5ljwMCQ4lG0AjJr2GRCJNbRRzix Nch8f3XZlY644c0JYQ76Qf3ZO1gj1H8pI4tvipwJpy8OMvNpEvXO84g2J4OWPpRs ncN69i1tuoEdpfGiabaFxknrxyJGXDaiDowUa5LUj7YevvFdfma6dTIa/IwO4Djd Y6eg5G/g0hbt3ryVIBEu3cIJY7ypkA0IkljkWynYWJZxR6fURk5pZlBUV1G0raJM LSR0UulDFXnGCKgHsME9WGLv13XXAcaMGsFYbb/JlhusCxd0CLAfKZ/EoN7UgfPy eI6r9xJoRRBdlnfi4CsaoB2gME9IqptRHKi7cdJ+ibEJ9aY9EdttO+EXN7lNm5EY p4ek/sZI3zK7XWMJb6m8UmRqpjAHQyYvQbYzVmiORm0TO1cX9llDeSJlBCWUi/yl DJNec7vbNcL7NkhFsg8LZ7XHpjjPd7DDF3wOcYMKOs6UITyW/1McscVmdM6uvuBH rEWWk9lBR2SiHTaeQmI3EsLbPupOdR/eFkzoU0P