Curious to know if anyone else has experienced this issue before.
I have a Postgresql 10.3 database on AIX 7.1 platform and this usually
does not give us any problems. But after a recent install on a server
that's handling about 275 users, we suddenly started seeing "exiting" processes from every call to open the database.
Now when the problem first starts, the exiting processes disappear after
a few seconds. This is normal. But the busier the system gets, the
longer it takes for them to disappear until it could be stuck there for hours. And the problem is that these exiting processes appear to still
use up one or more of the max_connections, leading to a situation where
we run out of connections.
Here's an example of what I'm referring to:
# ps -ef |grep exiting |wc -l
6250
# ps -ef |grep exiting |tail -5
- 33818386 - - - <exiting>
- 33949478 - - - <exiting>
- 34015016 - - - <exiting>
- 34080578 - - - <exiting>
- 34211634 - - - <exiting>
# proctree 33818386 4653070 /apps/pg_10.3/bin/postgres
33818386 <defunct>
According to IBM AIX documentation, the exiting/defunct process will
wait until the parent PID replies that it no longer needs the exit
status of the subprocess. And so it looks like PostgreSQL may not be
sending that reply, or is somehow delayed.
The question now is to how to get that to be not included if we compile postgres.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 88:04:25 |
Calls: | 6,658 |
Files: | 12,203 |
Messages: | 5,333,954 |