Forum: >>> Magnum BBS <<<

Parallel(?) programming with python

From Andreas Croci@21:1/5 to All on Mon Aug 8 12:47:26 2022

tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Another part of the program should take the list, as it has been filled
so far, every 6 hours or so, and do some computations on the data (a FFT).

Every so often (say once a week) the list should be saved to a file,
shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving) should
be appended to the file. A timestamp is in the data, so it's easy to say
what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every second)
while another part computes something on the data of the same list,
ignoring the new data being written?

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do
this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Andreas Croci on Mon Aug 8 11:20:43 2022

Andreas Croci <andrea.croci@gmx.de> writes:

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do >this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.

import threading
import time

def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )

def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )

list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

In basketball, first you must learn to dribble and pass,
before you can begin to shoot.

With certain reservations, texts that can be considered
to learn Python are:

"Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013),
How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
The Coder's Apprentice - Pieter Spronck (2016-09-21), and
Python Programming - John Zelle (2009).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Stefan Ram on Mon Aug 8 11:54:33 2022

ram@zedat.fu-berlin.de (Stefan Ram) writes:

if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()

PS: It might be better to acquire the lock even before reading.
As I wrote, I haven't learned this yet!

There are some forms of "parallelism" one does not need threads for.

def f_():
for i in range( 10 ):
print( i )
yield

def g_():
for i in range( 65, 75 ):
print( chr( i ))
yield

print( "starting" )
f = f_()
g = g_()
try:
while True:
next( f )
next( g )
except StopIteration:
pass

And there might be other times where one wants to use
processes instead of threads. Then, there is the dreaded
"Python Global Interpreter Lock" or "GIL".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Croci@21:1/5 to Stefan Ram on Mon Aug 8 13:53:20 2022

Thanks for your reply.

On 08.08.22 13:20, Stefan Ram wrote:

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

Indeed.

I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.

import threading
import time

def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )

def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )

list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

If I understand some things correctly, a "lock" would be something that,
as the name says, locks, meaning prevents parts of the program from
executing on the locked resource until ohter parts have finished doing
their things and have released the lock. If this is correct, it's not
exactly what I wanted, because this way "parts of the program" would not
"keep doing their things, while other parts do other things on the same
data".

I'm in principle ok with locks, if it must be. What I fear is that the
lock could last long and prevent the function that writes into the list
from doing so every second. With an FFT on a list that contains a few
bytes taken every second over one week time (604.800 samples), I believe
it's very likely that the FFT function takes longer than a second to return.

Then I would have to import all the data I have missed since the lock
was aquired, which is doable, but I would like to avoid it if possible.

In basketball, first you must learn to dribble and pass,
before you can begin to shoot.

Sure.

With certain reservations, texts that can be considered
to learn Python are:

"Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013),
How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
The Coder's Apprentice - Pieter Spronck (2016-09-21), and
Python Programming - John Zelle (2009).

Thank you for the list. I an currently taking a Udemy course and at the
same time reading the tutorials on python.org. I hope I will some day
come to any of the books you suggest (I'm doing this only in my spare
time and it will take forever).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julio Di Egidio@21:1/5 to Andreas Croci on Mon Aug 8 05:55:03 2022

On Monday, 8 August 2022 at 13:53:42 UTC+2, Andreas Croci wrote:

I'm in principle ok with locks, if it must be.

Concurrent programming is quite difficult, plus you better think
in terms of queues than shared data... But, an easier and often
better option for concurrent data access is use a (relational)
database, then the appropriate transaction isolation levels
when reading and/or writing.

Julio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Croci@21:1/5 to Julio Di Egidio on Mon Aug 8 19:39:27 2022

Thank you for your reply.

On 08.08.22 14:55, Julio Di Egidio wrote:

Concurrent programming is quite difficult, plus you better think
in terms of queues than shared data...

Do you mean queues in the sense of deque (the data structure)? I ask
because I can see the advantage there when I try to pop data from the
front of it, but I don't see the sense of the following statement ("than
shared data"). I mean, I called my structure a list, but it may well be
a queue instead. That wouldn't prevent it from being shared in the idea
I described: one function would still append data to it while the other
is reading what is there up to a certain point and calculate the FFT of it.

But, an easier and often

better option for concurrent data access is use a (relational)
database, then the appropriate transaction isolation levels
when reading and/or writing.

That would obviusly save some coding (but would introduce the need to
code the interaction with the database), but I'm not sure it would speed
up the thing. Would the RDBMS allow to read a table while something else
is writing to it? I doubt it and I'm not sure it doesn't flush the cache
before letting you read, which would include a normally slow disk access.

Andreas

Julio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Louis Krupp@21:1/5 to Andreas Croci on Mon Aug 8 12:02:19 2022

On 8/8/2022 4:47 AM, Andreas Croci wrote:

tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Another part of the program should take the list, as it has been
filled so far, every 6 hours or so, and do some computations on the
data (a FFT).

Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving)
should be appended to the file. A timestamp is in the data, so it's
easy to say what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every
second) while another part computes something on the data of the same
list, ignoring the new data being written?

Basically the question boils down to wether it is possible to have
parts of a program (could be functions) that keep doing their job
while other parts do something else on the same data, and what is the
best way to do this.

You might be able to do what you need by making the file system work for
you:

Use numbered files, something like DATA/0001, DATA/0002, etc.

Start by initializing a file number variable to 1 and creating an empty
file, DATA/0001. The current time will be your start time.

In an infinite loop, just as in Stefan's example:

Read from the network and append to the current data file. This
shouldn't take long unless the file is on a remote system.

If six hours have gone by (compare the current time to the start time),
close the current date file, create a thread (see Stefan's example) to
call your FFT with the name of the current file, increment the file
number, and open a new empty data file.

If you want to, you can consolidate files every week or so. The Python
library has functions that will let you get a list files in a directory.
If you're on a Linux or UNIX system, you can use shell commands to
append, copy or rename files.

Have fun.

Louis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dennis Lee Bieber@21:1/5 to All on Mon Aug 8 13:56:02 2022

On Mon, 8 Aug 2022 12:47:26 +0200, Andreas Croci <andrea.croci@gmx.de> declaimed the following:

tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Ignoring leap seconds, there are 86400 seconds in a day -- how many bytes are you planning to read each second?

Maybe more important? Is this a constant network connection feeding you bytes (in which case the bytes available to read will be controlled by the sender -- which may be sending continuously and building up a back log if
you don't empty the stream. Or are you planning to make a socket
connection, read n-bytes, close socket?

Another part of the program should take the list, as it has been filled
so far, every 6 hours or so, and do some computations on the data (a FFT).

"6 hours or so"? That leaves one open to all sorts of variable timing. In either event, a 6 hour interval is more suited to a process started by a cron job (Linux/Unix) or Task Scheduler (Windows). Having a thread sleep
for 6 hours means no safeguard if the parent process should die at some
point (and if you are keeping the data in an internal list, you lose all
that data too)

Every so often (say once a week) the list should be saved to a file,

This REQUIRES the process to not fail at any point, nor any system restarts, etc. And (see prior paragraphs) how much data are you
accumulating. In one week you have 604800 "reads". If you are reading 10
bytes each time, that makes 6MB of data you could potentially lose (on most modern hardware, 6MB is not a memory concern... Even 32-bit OS should be
able to find space for 600MB of data...).

Much better would be to write the file as you read each chunk. If the file is configured right, a separate process should be able to do read-only processing of the file even while the write process is on-going. OR, you attempt an open/write/close cycle which could be blocked while your FFT is processing -- you'd have to detect that situation and buffer the read data until you get a subsequent successful open, at which time you'd write all
the backlog data.

Or you could even have your FFT process copy the data to the long term file, while the write process just starts a new file when it finds itself blocked (and the FFT deletes the file it was reading).

shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving) should
be appended to the file. A timestamp is in the data, so it's easy to say
what is new and what was already there.

Personally, this sounds more suited for something like SQLite3... Insert new records as the data is read, with timestamps. FFT process
selects records based upon last data ID (that it processed previously) to
end of new data. SQLite3 database IS the long-term storage. Might need a
second table to hold the FFT process "last data ID" so on start up it can determine where to begin.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every second)
while another part computes something on the data of the same list,
ignoring the new data being written?

Well, if you really want ONE program -- you'll likely be looking at the Threading module (I don't do "async", and your task doesn't seem suited for async type call backs -- one thread that does the fetching of data, and a second that does the FFT processing, which will be sleeping most of the
time).

But either way, I'd suggest not keeping the data in an internal list; use some RDBM to keep the long-term data, accumulating it as you fetch it,
and letting the FFT read from the database for its processing.

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julio Di Egidio@21:1/5 to Andreas Croci on Mon Aug 8 11:07:28 2022

On Monday, 8 August 2022 at 19:39:48 UTC+2, Andreas Croci wrote:

Thank you for your reply.
On 08.08.22 14:55, Julio Di Egidio wrote:

Concurrent programming is quite difficult, plus you better think
in terms of queues than shared data...

Do you mean queues in the sense of deque (the data structure)?

<snip>

No, I mean a "message queue", i.e. a synchronized queue, so that
independent components can begin/end async invocations. It
still has enqueue/dequeue, but these queues encapsulates the
needed synchronization. IOW, back to the overall picture, this is
about thinking "pipelines".

Here is just the first link I have found, but there is quite some
literature on the subject of concurrent programming and primitives: <https://duckduckgo.com/?q=async+message+queue>

But, an easier and often
better option for concurrent data access is use a (relational)
database, then the appropriate transaction isolation levels
when reading and/or writing.

That would obviusly save some coding (but would introduce the need to
code the interaction with the database), but I'm not sure it would speed
up the thing. Would the RDBMS allow to read a table while something else
is writing to it? I doubt it and I'm not sure it doesn't flush the cache before letting you read, which would include a normally slow disk access.

An RDBMS not only does all that quite well (look up "ACID" and "transaction isolation levels"), it also does it with an efficiency and reliability that you would hardly be able to match in custom code, plus most RDBMS also
provide message queues and job schedulers out of the box and more.

That said, in simple scenarios all of the above might be overkill and a bit
of locking over shared data might indeed do the trick for you, just not if
you do want the reliability and performance...

Julio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julio Di Egidio@21:1/5 to Julio Di Egidio on Mon Aug 8 11:11:46 2022

On Monday, 8 August 2022 at 20:07:39 UTC+2, Julio Di Egidio wrote:

On Monday, 8 August 2022 at 19:39:48 UTC+2, Andreas Croci wrote:

<snip>

Here is just the first link I have found, but there is quite some
literature on the subject of concurrent programming and primitives: <https://duckduckgo.com/?q=async+message+queue>

P.S. Sorry, I have messed up with the link, anyway that should work
as a quick starting point.

Julio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Ram@21:1/5 to Stefan Ram on Mon Aug 8 18:59:46 2022

ram@zedat.fu-berlin.de (Stefan Ram) writes:

Yes, but this is difficult.

Here is an excerpt from a text by Edward E. Lee
about a programing project with thread:

|A part of the Ptolemy Project experiment was to see
|whether effective software engineering practices could be
|developed for an academic research setting. We developed a
|process that included a code maturity rating system (with
|four levels, red, yellow, green, and blue), design
|reviews, code reviews, nightly builds, regression tests,
|and automated code coverage metrics [43]. The portion of
|the kernel that ensured a consistent view of the program
|structure was written in early 2000, design reviewed to
|yellow, and code reviewed to green. The reviewers included
|concurrency experts, not just inexperienced graduate
|students (Christopher Hylands (now Brooks), Bart Kienhuis,
|John Reekie, and myself were all reviewers). We wrote
|regression tests that achieved 100 percent code coverage.
|The nightly build and regression tests ran on a two
|processor SMP machine, which exhibited different thread
|behavior than the development machines, which all had a
|single processor. The Ptolemy II system itself began to be
|widely used, and every use of the system exercised this
|code. No problems were observed until the code deadlocked
|on April 26, 2004, four years later.
Edward E. Lee (2006-01-10).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Raymond@21:1/5 to All on Mon Aug 8 18:46:28 2022

Pj4gQnV0LCBhbiBlYXNpZXIgYW5kIG9mdGVuDQo+PiBiZXR0ZXIgb3B0aW9uIGZvciBjb25jdXJy ZW50IGRhdGEgYWNjZXNzIGlzIHVzZSBhIChyZWxhdGlvbmFsKQ0KPj4gZGF0YWJhc2UsIHRoZW4g dGhlIGFwcHJvcHJpYXRlIHRyYW5zYWN0aW9uIGlzb2xhdGlvbiBsZXZlbHMNCj4+IHdoZW4gcmVh ZGluZyBhbmQvb3Igd3JpdGluZy4NCj4+DQo+DQo+IFRoYXQgd291bGQgb2J2aXVzbHkgc2F2ZSBz b21lIGNvZGluZyAoYnV0IHdvdWxkIGludHJvZHVjZSB0aGUgbmVlZCB0bw0KPiBjb2RlIHRoZSBp bnRlcmFjdGlvbiB3aXRoIHRoZSBkYXRhYmFzZSksIGJ1dCBJJ20gbm90IHN1cmUgaXQgd291bGQg c3BlZWQNCj4gdXAgdGhlIHRoaW5nLiBXb3VsZCB0aGUgUkRCTVMgYWxsb3cgdG8gcmVhZCBhIHRh YmxlIHdoaWxlIHNvbWV0aGluZyBlbHNlDQo+IGlzIHdyaXRpbmcgdG8gaXQ/IEkgZG91YnQgaXQg YW5kIEknbSBub3Qgc3VyZSBpdCBkb2Vzbid0IGZsdXNoIHRoZSBjYWNoZQ0KPiBiZWZvcmUgbGV0 dGluZyB5b3UgcmVhZCwgd2hpY2ggd291bGQgaW5jbHVkZSBhIG5vcm1hbGx5IHNsb3cgZGlzayBh Y2Nlc3MuDQoNClNRTGl0ZSBmb3IgZXhhbXBsZSBhbGxvd3Mgb25seSAxIHdyaXRlIHRyYW5zYWN0 aW9uIGF0IGEgdGltZSwgYnV0IGluIFdBTCBtb2RlIHlvdSBjYW4gaGF2ZSBhcyBtYW55IHJlYWQg dHJhbnNhY3Rpb25zIGFzIHlvdSB3YW50IGFsbCBnb2luZyBhbG9uZyBhdCB0aGUgc2FtZSB0aW1l IGFzIHRoYXQgMSB3cml0ZXIuIEl0IGFsc28gYWxsb3dzIHlvdSB0byBzcGVjaWZ5IGhvdyB0aG9y b3VnaCBpdCBpcyBpbiBmbHVzaGluZyBkYXRhIHRvIGRpc2ssIGluY2x1ZGluZyBub3QgZm9yY2lu ZyBhIHN5bmMgdG8gZGlzayBhdCBhbGwgYW5kIGp1c3QgbGVhdmluZyB0aGF0IHRvIHRoZSBPUyB0 byBkbyBvbiBpdHMgb3duIHRpbWUuDQo=

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MRAB@21:1/5 to Stefan Ram on Mon Aug 8 20:18:16 2022

On 2022-08-08 12:20, Stefan Ram wrote:

Andreas Croci <andrea.croci@gmx.de> writes:

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other >>parts do something else on the same data, and what is the best way to do >>this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.

import threading
import time

def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )

def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )

list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

In basketball, first you must learn to dribble and pass,
before you can begin to shoot.

With certain reservations, texts that can be considered
to learn Python are:

"Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013),
How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
The Coder's Apprentice - Pieter Spronck (2016-09-21), and
Python Programming - John Zelle (2009).

When working with threads, you should use queues, not lists, because
queues do their own locking and can wait for items to arrive, with a
timeout, if desired:

import queue
import threading
import time

def write_to_item_queue(item_queue):
for i in range(10):
print("Put", i, "in queue.", flush=True)
item_queue.put(i)
time.sleep(3)

# Using None to indicate that there's no more to come.
item_queue.put(None)

def read_from_item_queue(item_queue):
while True:
try:
item = item_queue.get()
except item_queue.Empty:
print("Queue is empty; should've have got here!", flush=True)
else:
print("Queue contains " + str(item) + ".", flush=True)

if item is None:
# Using None to indicate that there's no more to come.
break

item_queue = queue.Queue()

write_thread = threading.Thread(target=write_to_item_queue,
args=[item_queue])
write_thread.start()

read_thread = threading.Thread(target=read_from_item_queue,
args=[item_queue])
read_thread.start()

# Wait for the threads to finish.
write_thread.join()
read_thread.join()

print("Finished.")

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry@21:1/5 to All on Mon Aug 8 21:43:13 2022

On 8 Aug 2022, at 20:24, MRAB <python@mrabarnett.plus.com> wrote:

On 2022-08-08 12:20, Stefan Ram wrote:

Andreas Croci <andrea.croci@gmx.de> writes:

Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.
I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.
import threading
import time
def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )
def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )
list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() >> threading.Thread( target=read_from_list, args=[ list, lock, event ]).start() >> In basketball, first you must learn to dribble and pass,
before you can begin to shoot.
With certain reservations, texts that can be considered
to learn Python are:
"Object-Oriented Programming in Python Documentation" - a PDF file,
Introduction to Programming Using Python - Y Daniel Liang (2013),
How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
The Coder's Apprentice - Pieter Spronck (2016-09-21), and
Python Programming - John Zelle (2009).

When working with threads, you should use queues, not lists, because queues do their own locking and can wait for items to arrive, with a timeout, if desired:

Lists do not need to be locked in python because of the GIL.
However you need locks to synchronise between threads.
And as you say a queue has all that locking built in.

Barry

import queue
import threading
import time

def write_to_item_queue(item_queue):
for i in range(10):
print("Put", i, "in queue.", flush=True)
item_queue.put(i)
time.sleep(3)

# Using None to indicate that there's no more to come.
item_queue.put(None)

def read_from_item_queue(item_queue):
while True:
try:
item = item_queue.get()
except item_queue.Empty:
print("Queue is empty; should've have got here!", flush=True)
else:
print("Queue contains " + str(item) + ".", flush=True)

if item is None:
# Using None to indicate that there's no more to come.
break

item_queue = queue.Queue()

write_thread = threading.Thread(target=write_to_item_queue, args=[item_queue])
write_thread.start()

read_thread = threading.Thread(target=read_from_item_queue, args=[item_queue])
read_thread.start()

# Wait for the threads to finish.
write_thread.join()
read_thread.join()

print("Finished.")
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to Andreas Croci on Mon Aug 8 23:09:27 2022

On 2022-08-08 13:53:20 +0200, Andreas Croci wrote:

I'm in principle ok with locks, if it must be. What I fear is that the lock could last long and prevent the function that writes into the list from
doing so every second. With an FFT on a list that contains a few bytes taken every second over one week time (604.800 samples), I believe it's very
likely that the FFT function takes longer than a second to return.

You woudn't lock the part performing the FFT, of course, only the part manipulating the shared list.

That said, CPython (the reference implementation of Python) has what is
called the Global Interpreter Lock (GIL) which locks every single Python instruction. So you can't have two threads actually computing anything
at the same time - at least not if the computation is written in Python.
Math packages like Numpy may or may not release the lock while they are
busy.

hp

PS: I also agree with what others have said about the perils of
multi-threaded programming.

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmLxe34ACgkQ8g5IURL+ KF26mhAAo/GyEiIXiZtgTl8Ck/1dbUlIGkU1AnU/0KwPUAy2OucCWjedRbqK5NRv L9XBw1DtFxXxag2ntMpuUBrRoRI0VkJ752dqE/T8j0xdxRfbDiSwxVbVgp03nxFZ wMVr5+WYZW+UkH8Nl2yr+Yc1L7j/vNjm1PdhdfUiWwQtE9nXxLyuORErCxAlOjg0 pJS3hulMu7mAsu3Cjnva2zy7PeQE7ddJHQlZsAdXxROhiismGGRQfW5SM0T2mabk i8qNmhOGCjweQXXlr1J4NJXHzqmREGrKRZ3rAKE9WsJnmCDHvHDlEmTmDFNWG9cE VIikl90FDdSc39zxWQl/GdNgSaURDVIgnj8r+lAH/4edtVk6utiOJB2Bc8qIkuNZ hSyMzPdztauzHg9A55JEa1Gf9YR0WgOuLR4X5+xbiS81aFdLOZ8YWTnEwz4dgGYK 2aflph0Pg3tyy9yMYpq+3CUQPIwCGdzLTgNpcjRwDIzerjvQPXthjnL4b5psL14a +rWOTl7RRCBAdggMrmpv6ZYZGOb+jz6bWtgaO5yEG70BmN9jcYNxsj9YRwAiLrcz mu63oT7/J4/ccBZA+QQFDydw+Qh47+ReojP27vkiuQT6oCE8ytRfYXqskeoch4JZ NXHzeHHnjobVlXfobPKnsFbyJ5B0VZJkY8dq3Lf

From Cameron Simpson@21:1/5 to Stefan Ram on Tue Aug 9 08:37:34 2022

On 08Aug2022 11:20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:

Andreas Croci <andrea.croci@gmx.de> writes:

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other >>parts do something else on the same data, and what is the best way to do >>this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

This is a very standard requirement for any concurrent activity and the
typical approach is a mutex (mutual exclusion). You've already hit on
the "standard" approach: a `threading.Lock` object.

lock.acquire()
try:
list.append( i )
finally:
lock.release()

Small note, which makes writing this much clearer. Lock objects are
context managers. So:

with lock:
list.append(i)

is all you need.

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to Andreas Croci on Mon Aug 8 19:50:13 2022

Stefan,

You are correct that the goal of a lock is to do something rather quickly
and atomically, so your design should not do something complex or long
before releasing the lock.

In your example, you have a producer adding data as regularly as every
second and another that wakes up rarely and processes all the data since the last time. So you may want to augment the code you had to do something fast like point another variable at the data gathered so far and move the
original variable to an empty list or whatever. Then you release the lock within fractions of a second and let the regular job keep adding to the initially empty list while the other part of the code processes without a
lock.

A design like the above has the busy worker constantly checking the lock. An alternative if you are sure the other process will only show up almost
exactly at 6 hours on the clock, is to have the busy one check the time instead, but that may be more expensive.

Still other architectures are possible, such as writing to not a single list for six hours, but some data structure with multiple sub-lists such as one where you switch every minute or so. The second process can note how many entries there are at the moment, and does all but the last and notes the location so the next time it starts there. This would work if you did not
need every last bit of data as the two do not interfere with each other. And
no real locks would be needed as the only thing the two parts share is the position or identity of the current last fragment which only one process actually touches.

Just some ideas. Lots of other variations are very possible.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Stefan Ram
Sent: Monday, August 8, 2022 7:21 AM
To: python-list@python.org
Subject: Re: Parallel(?) programming with python

Andreas Croci <andrea.croci@gmx.de> writes:

Basically the question boils down to wether it is possible to have
parts of a program (could be functions) that keep doing their job while
other parts do something else on the same data, and what is the best
way to do this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.

import threading
import time

def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )

def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )

list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

In basketball, first you must learn to dribble and pass,
before you can begin to shoot.

With certain reservations, texts that can be considered
to learn Python are:

"Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013), How to
Think Like a Computer Scientist - Peter Wentworth (2012-08-12), The Coder's Apprentice - Pieter Spronck (2016-09-21), and Python Programming - John
Zelle (2009).

--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Oscar Benjamin@21:1/5 to Andreas Croci on Tue Aug 9 00:22:12 2022

On Mon, 8 Aug 2022 at 19:01, Andreas Croci <andrea.croci@gmx.de> wrote:

tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Another part of the program should take the list, as it has been filled
so far, every 6 hours or so, and do some computations on the data (a FFT).

Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving) should
be appended to the file. A timestamp is in the data, so it's easy to say
what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every second)
while another part computes something on the data of the same list,
ignoring the new data being written?

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do this.

Why do these "parts of a program" need to be part of the *same*
program. I would write this as just two separate programs. One
collects the data and writes it to a file. The other periodically
reads the file and computes the DFT.

Note that a lot of the complexity discussed in other posts to do with
threads and locks etc comes from the supposed constraint that this
needs to be done with threads or something else that can work in
parallel *within the same program*. If you relax that constraint the
problem becomes a lot simpler.

--
Oscar

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dennis Lee Bieber@21:1/5 to All on Mon Aug 8 20:50:17 2022

On Mon, 8 Aug 2022 19:39:27 +0200, Andreas Croci <andrea.croci@gmx.de> declaimed the following:

Do you mean queues in the sense of deque (the data structure)? I ask
because I can see the advantage there when I try to pop data from the
front of it, but I don't see the sense of the following statement ("than

Most likely this was a reference to the Queue module -- which is used to pass data from one thread to another. Your "fetch" thread would package
up the "new" data to be processed by the FFT thread. The FFT thread is
blocked waiting for data to appear on the queue -- when it appears, the FFT thread reads the entire packet of data and proceeds to process it.

Note that in this scheme, the FFT thread is NOT on a timer -- the fetch thread controls the timing by when it puts data into the queue.

cf:
https://docs.python.org/3/library/threading.html https://docs.python.org/3/library/queue.html

That would obviusly save some coding (but would introduce the need to
code the interaction with the database), but I'm not sure it would speed
up the thing. Would the RDBMS allow to read a table while something else
is writing to it? I doubt it and I'm not sure it doesn't flush the cache >before letting you read, which would include a normally slow disk access.

Depends upon the RDBMs. Some are "multi-version concurrency" -- they snapshot the data at the time of the read, while letting new writes
proceed. But if one is doing read/modify/write, this can cause a problem as
the RDBM will detect that a record was modified by someone else and prevent
you from changing it -- you have to reselect the data to get the current version.

You will want to treat each of your network fetches as a transaction -- and close the transaction fast. Your FFT process would need to select all
data in the range to be processed, and load it into memory so you can free
that transaction

https://www.sqlite.org/lockingv3.html See section 3.0 and section 5.0

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cameron Simpson@21:1/5 to Oscar Benjamin on Tue Aug 9 12:30:53 2022

On 09Aug2022 00:22, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

On Mon, 8 Aug 2022 at 19:01, Andreas Croci <andrea.croci@gmx.de> wrote:

Basically the question boils down to wether it is possible to have
parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do
this.

Which is of course feasible, as others have outlined.

Why do these "parts of a program" need to be part of the *same*
program. I would write this as just two separate programs. One
collects the data and writes it to a file. The other periodically
reads the file and computes the DFT.

I would also write these as separate programmes, or at least as distinct
modes of the same programme (eg "myprog poll" and "myprog archive" etc). Largely because you might run the "poll" regularly and briefly, and the processes phase separately and less frequently. You don't need to keep a single programme lurking around forever - fire it up as required.

However, I want to point out that this _in no way_ removes the need for
access contol and mutexes. It will change the mechanism (because your
two programmes are now operating separately) and makes it more concrete
in your mind what _actually and precisely_ needs protection.

For example, you probably want to avoid _processing_ a data file at the
same time as _updating_ that file. Depending on what you're doing this
can be as simple as keeping "to be updated" files with distinct names
from "available to be processed/archived" files. This is a standard
difficulty with "hot folder" upload areas.

A common approach might be to write a file with a "temp" style name (eg ".tmp*") until completed, then rename it to its official name (eg "datafile*"). And then your processing/archiving side can simply ignore
the "in progress" files because they do not match the names it cares
about.

Anyway, those are specifics, which will be driven by what you're
actually doing. The point is that you still need to coordinate use of
the files suitably for your needs. Doing this in one long running
programme with Threads/mutexes or separate programmes sharing a data
directory just changes the mechanisms.

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Stromberg@21:1/5 to andrea.croci@gmx.de on Mon Aug 8 20:43:37 2022

Queues are better than lists for concurrency. If you get the right kind,
they have implicit locking, making your code simpler and more robust at the same time.

CPython threading is mediocre for software systems that have one or more CPU-bound threads, and your FFT might be CPU-bound.

Rather than using threading directly, you probably should use https://docs.python.org/3/library/concurrent.futures.html , which gives you easy switching between threads and processes.

Or if you, like me, get inordinately joyous over programs that run on more
than one kind of Python, you could give up concurrent.futures and use
_thread. Sadly, that gives up easy flipping between threads and processes,
but gives you easy flipping between CPython and micropython. Better still, micropython appears to have more scalable threading than CPython, so if you decide you need 20 CPU-hungry threads someday, you are less likely to be in
a bind.

For reading from a socket, if you're not going the REST route, may I
suggest https://stromberg.dnsalias.org/~strombrg/bufsock.html ? It deals
with framing and lengths relatively smoothly. Otherwise, robust socket
code tends to need while loops and tedious arithmetic.

HTH

On Mon, Aug 8, 2022 at 10:59 AM Andreas Croci <andrea.croci@gmx.de> wrote:

I would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Schachner, Joseph (US)@21:1/5 to All on Tue Aug 9 17:04:51 2022

V2h5IHdvdWxkIHRoaXMgYXBwbGljYXRpb24gKnJlcXVpcmUqIHBhcmFsbGVsIHByb2dyYW1taW5n PyAgIFRoaXMgY291bGQgYmUgZG9uZSBpbiBvbmUsIHNpbmdsZSB0aHJlYWQgcHJvZ3JhbS4gICBD YWxsIHRpbWUgdG8gZ2V0IHRpbWUgYW5kIHNhdmUgaXQgYXMgc3RhcnRfdGltZS4gICBLZWVwIGEg Y291bnQgb2YgdGhlIG51bWJlciBvZiA2IGhvdXIgaW50ZXJ2YWxzLCBpbml0aWFsaXplIGl0IHRv IDAuDQoNCk9uY2UgYSBzZWNvbmQgcmVhZCBkYXRhIGFuIGFwcGVuZCB0byBsaXN0LiAgQXQgNiBo b3VycyBhZnRlciBzdGFydCB0aW1lLCBjYWxsIGEgZnVuY3Rpb24gdGhhdCBkb2VzIGFuIEZGVCAo c2VlIGNvbW1lbnQgYWJvdXQgc2NpcHkgYmVsb3cpIGFuZCBpbmNyZW1lbnQgdGhlIGNvdW50IG9m IDYgaG91ciBpbnRlcnZhbHMuICBDYWxsIHRpbWUgYW5kIHNhdmUgbmV3IHN0YXJ0IHRpbWUuIENv bnRpbnVlIGV4ZWN1dGlvbi4NCg0KQWZ0ZXIgMjggc2l4IGhvdXIgaW50ZXJ2YWxzLCBzYXZlIHRo ZSBsaXN0IGFuZCB0aGVuIHNsaWNlIHRoZSBsaXN0IHRvICBzaG9ydGVuIGl0IGFzIHlvdSB3YW50 LiAgUmVzZXQgdGhlIGNvdW50IG9mIDYgaG91ciBpbnRlcnZhbHMgdG8gemVyby4NCg0KVGhlIEZG VCBtaWdodCB0YWtlIGEgc2Vjb25kLCBldmVuIGlmIHlvdSB1c2Ugc2NpcHksIGRlcGVuZGluZyBv biBob3cgbG9uZyB0aGUgbGlzdCBpcyAoSWYgeW91IGRvbuKAmXQga25vdyBhYm91dCBudW1weSBh bmQgc2NpcHksIGxvb2sgdGhlbSB1cCEgWW91IG5lZWQgdGhlbS4gICBZb3VyIGxpc3QgY2FuIGJl IGFuIGFycmF5IGluIG51bXB5KS4gIA0KU2F2aW5nIGFuZCBzbGljaW5nIHRoZSBsaXN0IHNob3Vs ZCB0YWtlIGxlc3MgdGhhbiBhIHNlY29uZC4NCg0KVGhpcyBzaW5nbGUgdGhyZWFkIGFwcHJvYWNo IGF2b2lkcyB0aGlua2luZyBhYm91dCBtdWx0aXByb2Nlc3NpbmcsIGxvY2tpbmcgYW5kIHVubG9j a2luZyBkYXRhIHN0cnVjdHVyZXMsIGFsbCB0aGF0IHN0dWZmIHRoYXQgZG9lcyBub3QgY29udHJp YnV0ZSB0byB0aGUgZ29hbCBvZiB0aGUgcHJvZ3JhbS4NCg0KLS0tIEpvc2VwaCBTLg0KDQoNClRl bGVkeW5lIENvbmZpZGVudGlhbDsgQ29tbWVyY2lhbGx5IFNlbnNpdGl2ZSBCdXNpbmVzcyBEYXRh DQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBBbmRyZWFzIENyb2NpIDxhbmRy ZWEuY3JvY2lAZ214LmRlPiANClNlbnQ6IE1vbmRheSwgQXVndXN0IDgsIDIwMjIgNjo0NyBBTQ0K VG86IHB5dGhvbi1saXN0QHB5dGhvbi5vcmcNClN1YmplY3Q6IFBhcmFsbGVsKD8pIHByb2dyYW1t aW5nIHdpdGggcHl0aG9uDQoNCnRJIHdvdWxkIGxpa2UgdG8gd3JpdGUgYSBwcm9ncmFtLCB0aGF0 IHJlYWRzIGZyb20gdGhlIG5ldHdvcmsgYSBmaXhlZCBhbW91bnQgb2YgYnl0ZXMgYW5kIGFwcGVu ZHMgdGhlbSB0byBhIGxpc3QuIFRoaXMgc2hvdWxkIGhhcHBlbiBvbmNlIGEgc2Vjb25kLg0KDQpB bm90aGVyIHBhcnQgb2YgdGhlIHByb2dyYW0gc2hvdWxkIHRha2UgdGhlIGxpc3QsIGFzIGl0IGhh cyBiZWVuIGZpbGxlZCBzbyBmYXIsIGV2ZXJ5IDYgaG91cnMgb3Igc28sIGFuZCBkbyBzb21lIGNv bXB1dGF0aW9ucyBvbiB0aGUgZGF0YSAoYSBGRlQpLg0KDQpFdmVyeSBzbyBvZnRlbiAoc2F5IG9u Y2UgYSB3ZWVrKSB0aGUgbGlzdCBzaG91bGQgYmUgc2F2ZWQgdG8gYSBmaWxlLCBzaG9ydGhlbmVk IGluIHRoZSBmcm9udCBieSBzbyBtYW55IGl0ZW1zLCBhbmQgZmlsbGVkIGZ1cnRoZXIgd2l0aCB0 aGUgZGF0YSBjb21pbmcgZm9tIHRoZSBuZXR3b3JrLiBBZnRlciB0aGUgZmlyc3Qgc2F2aW5nIG9m IHRoZSB3aG9sZSBsaXN0LCBvbmx5IHRoZSBuZXcgcGFydCAodGhlIGRhdGEgdGhhdCBoYXZlIGNv bWUgc2luY2UgdGhlIGxhc3Qgc2F2aW5nKSBzaG91bGQgYmUgYXBwZW5kZWQgdG8gdGhlIGZpbGUu IEEgdGltZXN0YW1wIGlzIGluIHRoZSBkYXRhLCBzbyBpdCdzIGVhc3kgdG8gc2F5IHdoYXQgaXMg bmV3IGFuZCB3aGF0IHdhcyBhbHJlYWR5IHRoZXJlLg0KDQpJJ20gbm90IHN1cmUgaG93IHRvIGRv IHRoaXMgcHJvcGVybHk6IGNhbiBJIHdyaXRlIGEgcGFydCBvZiBhIHByb2dyYW0gdGhhdCBrZWVw cyBkb2luZyBpdHMgam9iIChhcHBlbmRpbmcgZGF0YSB0byB0aGUgbGlzdCBvbmNlIGV2ZXJ5IHNl Y29uZCkgd2hpbGUgYW5vdGhlciBwYXJ0IGNvbXB1dGVzIHNvbWV0aGluZyBvbiB0aGUgZGF0YSBv ZiB0aGUgc2FtZSBsaXN0LCBpZ25vcmluZyB0aGUgbmV3IGRhdGEgYmVpbmcgd3JpdHRlbj8NCg0K QmFzaWNhbGx5IHRoZSBxdWVzdGlvbiBib2lscyBkb3duIHRvIHdldGhlciBpdCBpcyBwb3NzaWJs ZSB0byBoYXZlIHBhcnRzIG9mIGEgcHJvZ3JhbSAoY291bGQgYmUgZnVuY3Rpb25zKSB0aGF0IGtl ZXAgZG9pbmcgdGhlaXIgam9iIHdoaWxlIG90aGVyIHBhcnRzIGRvIHNvbWV0aGluZyBlbHNlIG9u IHRoZSBzYW1lIGRhdGEsIGFuZCB3aGF0IGlzIHRoZSBiZXN0IHdheSB0byBkbyB0aGlzLg0K

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dieter Maurer@21:1/5 to All on Wed Aug 10 19:33:04 2022

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:

Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.

You could also use the `sched` module from Python's library.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From 2QdxY4RzWzUUiLuE@potatochowder.com@21:1/5 to Joseph.Schachner@Teledyne.com on Wed Aug 10 13:43:47 2022

On 2022-08-09 at 17:04:51 +0000,
"Schachner, Joseph (US)" <Joseph.Schachner@Teledyne.com> wrote:

Why would this application *require* parallel programming? This could
be done in one, single thread program. Call time to get time and save
it as start_time. Keep a count of the number of 6 hour intervals,
initialize it to 0.

In theory, you are correct.

In practice, [stuff] happens. What if your program crashes? Or the
computer crashes? Or there's a Python update? Or an OS update? Where
does all that pending data go, and how will you recover it after you've addressed whatever happened? �

OTOH, once you start writing the pending data to a file, then it's an
extremely simple leap to multiple programs (rather than multiple
threads) for all kinds of good reasons.

� FWIW, I used to develop highly available systems, such as telephone
switches, which allow [stuff] to happen, and yet continue to function.
It's pretty cool to yank a board (yes, physically remove it, without
warning) from the system without [apparently] disrupting anything. Such systems also allow for hardware, OS, and application upgrades, too
(IIRC, we were allowed a handful of seconds of downtime per year to meet
our availability requirements). That said, designing and building such
a system for the sakes of simplicity and convenience of the application
we're talking about here would make a pretty good definition of
"overkill."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dennis Lee Bieber@21:1/5 to All on Wed Aug 10 14:19:37 2022

On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de> declaimed the following:

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:

Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.

You could also use the `sched` module from Python's library.

<sigh> Time to really read the library reference manual again...

Though if I read this correctly, a long running action /will/ delay others -- which could mean the (FFT) process could block collecting new 1-second readings while it is active. It also is "one-shot" on the
scheduled actions, meaning those actions still have to reschedule
themselves for the next time period.

--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to All on Wed Aug 10 14:54:36 2022

There are many possible discussions we can have here and some are not really about whether and how to use Python.

The user asked how to do what is a fairly standard task for some people and arguably is not necessarily best done using a single application running
things in parallel.

So, yes, if you have full access to your machine and can schedule tasks,
then some obvious answers come to mind where one process listens and
receives data and stores it, and another process periodically wakes up and grabs recent data and processes it and perhaps still another process comes
up even less often and does some re-arrangement of old data.

And, yes, for such large volumes of data it may be a poor design to hold all the data in memory for many hours or even days and various ways of using a database or files/folders with a naming structure are a good idea.

But the original question remains, in my opinion, a not horrible one. All
kinds of applications can be written with sets of tasks run largely in
parallel with some form of communication between tasks using shared data structures like queues and perhaps locks and with a requirement that any
tasks that take nontrivial time need a way to buffer any communications to
not block others.

Also, for people who want to start ONE process and let it run, and perhaps
may not be able to easily schedule other processes on a system level, it can
be advantageous to know how to set up something along those lines within a single python session.

Of course, for efficiency reasons, any I/O to files slows things down but
what is described here as the situation seems to be somewhat easier and
safer to do in so many other ways. I think a main point is that there are
good ways to avoid the data from being acted on by two parties that share memory. One is NOT to share memory for this purpose. Another might be to
have the 6-hour process use a lock to move the data aside or send a message
to the receiving process to pause a moment and set the data aside and begin collecting anew while the old is processed and so on.

There are many such choices and the parts need not be in the same process or all written in python. But some solutions can be generalized easier than others. For example, can there become a need to collect data from multiple sources, perhaps using multiple listeners?

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Dieter Maurer
Sent: Wednesday, August 10, 2022 1:33 PM
To: Schachner, Joseph (US) <Joseph.Schachner@Teledyne.com>
Cc: Andreas Croci <andrea.croci@gmx.de>; python-list@python.org
Subject: RE: Parallel(?) programming with python

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:

Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it

to 0.

You could also use the `sched` module from Python's library.
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to Dennis Lee Bieber on Wed Aug 10 22:29:38 2022

On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote:

On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de> declaimed the following:

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:

Why would this application *require* parallel programming? This
could be done in one, single thread program. Call time to get time
and save it as start_time. Keep a count of the number of 6 hour >>intervals, initialize it to 0.

[...]

Though if I read this correctly, a long running action /will/
delay others -- which could mean the (FFT) process could block
collecting new 1-second readings while it is active.

Certainly, but does it matter? Data is received from some network
connection and network connections often involve quite a bit of
buffering. If the consumer is blocked for 3 or 4 or maybe even 20
seconds, the producer might not even notice. (This of course depends
very much on the details which we know nothing about.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmL0FS0ACgkQ8g5IURL+ KF1fJA//T7J2wy0rR3H9LTEfEcWFBOsMfOihPLFY217YZH6AXBDz6L8Ijrqns7wy 6m/1heiSrebJmz3DP2z/jq4aXPSyUGyVEJgebFFlUHpo+B/ZGz95tG5wkucmxwIh J4uC53W7dIX27+oS+2xwzarKw9hbvS6NR4SMjwdTwg2BddkEmEzBJOo3jboIRoo4 IIRLqctfHS6ScHaSUmawWs8ObfXG15oatenBxux1jGT+ZyTL7k+t5TRDWkG9rjs+ B+LVQn3IM247MvDRzOFyOeA6Wc15Hj9sfyZYO6psjuTYgf3onRQWZsomfXWbwCRj aCA1Sv+oREId8pkjGWQig6l3kZPhrT0P1Herkp43vJr7Ugncqki5CkMd0Z/p4u1U hOngK/lmSluPn78bmQjW1SNntphXnRZDEJMOH6xr8XZ6YEB5sT6kljCHHbYep8Xi jIQo3yB+f2G0x+hI51fG14PEUgVpxW8h9pdr26/jEJhzDnSHrG/586fZBaILjjoc UsMGW/PVgooyhCqmdCGpD7jtqyjopfWknlSlkNgr1xJF2kt5lKDWZ38Ku9pFzAti 3WyxHr4PLjQDOPWMpLFQBqpOGtQ6xE6Ub3sWQp4YHTMZYiAKzAvnTYvyxWEmNINm w1Pwqiane5V513jVLZ0g8F8bFZlRnpSUg9xDv5H

From subin@21:1/5 to 2QdxY4RzWzUUiLuE@potatochowder.com on Thu Aug 11 13:54:56 2022

Please let me know if that is okay.

On Wed, Aug 10, 2022 at 7:46 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:

On 2022-08-09 at 17:04:51 +0000,
"Schachner, Joseph (US)" <Joseph.Schachner@Teledyne.com> wrote:

Why would this application *require* parallel programming? This could
be done in one, single thread program. Call time to get time and save
it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.

In theory, you are correct.

In practice, [stuff] happens. What if your program crashes? Or the
computer crashes? Or there's a Python update? Or an OS update? Where
does all that pending data go, and how will you recover it after you've addressed whatever happened? ¹

OTOH, once you start writing the pending data to a file, then it's an extremely simple leap to multiple programs (rather than multiple
threads) for all kinds of good reasons.

¹ FWIW, I used to develop highly available systems, such as telephone switches, which allow [stuff] to happen, and yet continue to function.
It's pretty cool to yank a board (yes, physically remove it, without
warning) from the system without [apparently] disrupting anything. Such systems also allow for hardware, OS, and application upgrades, too
(IIRC, we were allowed a handful of seconds of downtime per year to meet
our availability requirements). That said, designing and building such
a system for the sakes of simplicity and convenience of the application
we're talking about here would make a pretty good definition of
"overkill."
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From subin@21:1/5 to hjp-python@hjp.at on Thu Aug 11 13:52:05 2022

Thanks again for the info.

On Wed, Aug 10, 2022 at 9:31 PM Peter J. Holzer <hjp-python@hjp.at> wrote:

On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote:

On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de

declaimed the following:

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:

Why would this application *require* parallel programming? This
could be done in one, single thread program. Call time to get time >>and save it as start_time. Keep a count of the number of 6 hour >>intervals, initialize it to 0.

[...]

Though if I read this correctly, a long running action /will/
delay others -- which could mean the (FFT) process could block
collecting new 1-second readings while it is active.

Certainly, but does it matter? Data is received from some network
connection and network connections often involve quite a bit of
buffering. If the consumer is blocked for 3 or 4 or maybe even 20
seconds, the producer might not even notice. (This of course depends
very much on the details which we know nothing about.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dieter Maurer@21:1/5 to Dennis Lee Bieber on Thu Aug 11 18:13:07 2022

Dennis Lee Bieber wrote at 2022-8-10 14:19 -0400:

On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de>
...

You could also use the `sched` module from Python's library.

<sigh> Time to really read the library reference manual again...

Though if I read this correctly, a long running action /will/ delay
others -- which could mean the (FFT) process could block collecting new >1-second readings while it is active. It also is "one-shot" on the
scheduled actions, meaning those actions still have to reschedule
themselves for the next time period.

Both true.

With `multiprocessing`, you can delegate long running activity
to a separate process.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Croci@21:1/5 to Andreas Croci on Mon Aug 15 08:59:06 2022

I would like to thank everybody who answered my question. The insight
was very informative. This seems to be one of the few newsgroups still
alive and kicking, with a lot of knowledgeable people taking the time to
help others. I like how quick and easy it is to post questions and
receive answers here as compared to web-based forums (although there are
some disadvantages too).

I'm implementing some of the ideas received here and I will surely have
other questions as I go. But the project will take a long time because
I'm doing this as a hobby during my vacation, that are unfortunately
about to end.

Thanks again, Community.

On 08.08.22 12:47, Andreas Croci wrote:

tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Another part of the program should take the list, as it has been filled
so far, every 6 hours or so, and do some computations on the data (a FFT).

Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving) should
be appended to the file. A timestamp is in the data, so it's easy to say
what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every second)
while another part computes something on the data of the same list,
ignoring the new data being written?

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	295
Nodes:	16 (2 / 14)
Uptime:	18:00:13
Calls:	6,640
Files:	12,187
Messages:	5,325,127

Parallel(?) programming with python

Who's Online

System Info