I am just trying to make up my mind with regards to what I should look into working with/making use of in terms of what have put in subject line?
As in, if want to be able to trigger multiple/various threads/processes to run in the background, possibly monitoring their states, either via interface, or via global variables, but, possibly while processing other forms of user interaction via the normal/main process, what would be recommended?
I am just trying to make up my mind with regards to what I should look
into working with/making use of in terms of what have put in subject line?
As in, if want to be able to trigger multiple/various threads/processes
to run in the background, possibly monitoring their states, either via interface, or via global variables, but, possibly while processing other forms of user interaction via the normal/main process, what would be recommended?
On Sat, 7 Jan 2023 at 04:54, jacob kruger <jacob.kruger.work@gmail.com> wrote:
I am just trying to make up my mind with regards to what I should lookAny. All. Whatever suits your purpose.
into working with/making use of in terms of what have put in subject line? >>
As in, if want to be able to trigger multiple/various threads/processes
to run in the background, possibly monitoring their states, either via
interface, or via global variables, but, possibly while processing other
forms of user interaction via the normal/main process, what would be
recommended?
They all have different goals, different tradeoffs. Threads are great
for I/O bound operations; they're easy to work with (especially in
Python), behave pretty much like just having multiple things running concurrently, and generally are the easiest to use. But you'll run
into limits as your thread count climbs (with a simple test, I started
seeing delays at about 10,000 threads, with more serious problems at 100,000), so it's not well-suited for huge scaling. Also, only one
thread at a time can run Python code, which limits them to I/O-bound
tasks like networking.
Multiple processes take a lot more management. You have to carefully
define your communication channels (for instance, a
multiprocessing.Queue() to collect results), but they can do CPU-bound
tasks in parallel. So multiprocessing is a good way to saturate all of
your CPU cores. Big downsides include it being much harder to share information between the processes, and much MUCH higher resource usage
than threads (with the same test as the above, I ran into limitations
at just over 500 processes - way fewer than the 10,000 threads!).
Asynchronous I/O runs a single thread in a single process. So like multithreading, it's only good for I/O bound tasks like networking.
It's harder to work with, though, since you have to be very careful to include proper await points, and you can stall out the entire event
loop with one mistake (common culprits being synchronous disk I/O, and gethostbyname). But the upside is that you get near-infinite tasks,
basically just limited by available memory (or other resources).
Use whichever one is right for your needs.
ChrisA
Ok, the specific usage case right now is that I need to set up a process pulling contents of e-mail messages from an IMAP protocol mail server, which I then populate into a postgresql database, and, since this is the inbox of
a relatively large-scale CRM/support system, there are currently over 2.5 million e-mails in the inbox, but, it can grow by over 50000 per day.
I already have the basic process operating, using imap_tools, but, wanted to enable you to query the process during run-time, without needing to either check logs, or query the database itself while it is on-the-go[...]
Also wanted to offer the ability to either pause, or terminate processes while it's busy batch processing large chunks of e-mail messages
So, I think that for now, threading is probably the simplest to look into.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 116:33:00 |
Calls: | 6,854 |
Files: | 12,355 |
Messages: | 5,416,747 |