https://gitlab.synchro.net/main/sbbs/-/commit/ffe092b65b38633a5ca185e3
Modified Files:
src/sbbs3/ringbuf.c
Log Message:
Set initial state of 'data_event' and 'highwater_event' to *not* signaledSince commit fafed094c52b0ca1 (2 months ago), the Synchronet Web Server onmy Windows systems has occasionally gone into a state where every HTTP sessionthread was causing 100% CPU utilization and each new HTTP session threadwould just exacerbate the problem eventually leading to complete systeminstability/unresponsiveness until the sbbs instance was terminated. This wasmore readily reproducible on a Win7-32 VM, but would occasionally, thoughmuch less frequently, happen in a native instance on Win10-64 as well.Using the VisualStudio debugger, I was able to narrow it down to this:Each new HTTP thread (eventually, hundreds of such threads) would get stuckin this loop in http_output_thread(): while(session->socket!=INVALID_SOCKET) { /* Wait for something to output in the RingBuffer */ if((avail=RingBufFull(obuf))==0) { /* empty */ if(WaitForEvent(obuf->data_event, 1000) != WAIT_OBJECT_0) continue; /* Check for spurious sem post... */ if((avail=RingBufFull(obuf))==0) continue; // <--- data_event signaled, but never cleared }There appears to be a race condition where this logic could be executedimmediately after the output ringbuf was created, but before writebuf() wasever called (which would have actually placed data in the output buffer),causing a potential high-utilization infinite loop: the data_event is signaledbut there is no data and the event is never reset and nothing can ever adddata to the ringbuf due to starvation of CPU cycles.Uses of ringbuf's data_event elsewhere in Synchronet don't seem to be subjectto this issue since they always call RingBufRead after, which will clear thedata_event when the ringbuf is actually empty (no similar loops to this one).The root cause just appears to be a simple copy/paste issue in the code addedto RingBufInit(): the preexisting 'empty_event' was initialized with acorrect initiate state of 'signaled' (because by default a ringbuf is empty)but the newly added events (data_event and hi
ghwater_event) should *not* beinitially-signaled because... the ringbuf is empty. So I added some parametercomments to these calls to CreateEvent() to hopefully make that more clearand prevent similar mistakes in the future.Relieved to have this one resolved finally.
--- SBBSecho 3.20-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)