• PostgreSQL 10.3 leaving "exiting" (defunct) processes on AIX platform

    From inverasln@gmail.com@21:1/5 to All on Mon Mar 18 13:28:00 2019
    Curious to know if anyone else has experienced this issue before.

    I have a Postgresql 10.3 database on AIX 7.1 platform and this usually does not give us any problems. But after a recent install on a server that's handling about 275 users, we suddenly started seeing "exiting" processes from every call to open the
    database.

    Now when the problem first starts, the exiting processes disappear after a few seconds. This is normal. But the busier the system gets, the longer it takes for them to disappear until it could be stuck there for hours. And the problem is that these
    exiting processes appear to still use up one or more of the max_connections, leading to a situation where we run out of connections.


    Here's an example of what I'm referring to:

    # ps -ef |grep exiting |wc -l
    6250

    # ps -ef |grep exiting |tail -5
    - 33818386 - - - <exiting>
    - 33949478 - - - <exiting>
    - 34015016 - - - <exiting>
    - 34080578 - - - <exiting>
    - 34211634 - - - <exiting>

    # proctree 33818386
    4653070 /apps/pg_10.3/bin/postgres
    33818386 <defunct>


    According to IBM AIX documentation, the exiting/defunct process will wait until the parent PID replies that it no longer needs the exit status of the subprocess. And so it looks like PostgreSQL may not be sending that reply, or is somehow delayed.

    Has anyone else experienced a similar issue? I'm wondering if I need to update my O/S which is already at a pretty current level, or perhaps arrange to update the Postgres database to 10.6 or higher.

    I guess my question: Is this potentially a bug in Postgres and how it releases processes?

    Thx

    Steve N.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Laurenz Albe@21:1/5 to inverasln on Mon Mar 18 23:05:03 2019
    On Mon, 18 Mar 2019 13:28:00 -0700, inverasln wrote:

    Curious to know if anyone else has experienced this issue before.

    I have a Postgresql 10.3 database on AIX 7.1 platform and this usually
    does not give us any problems. But after a recent install on a server
    that's handling about 275 users, we suddenly started seeing "exiting" processes from every call to open the database.

    Now when the problem first starts, the exiting processes disappear after
    a few seconds. This is normal. But the busier the system gets, the
    longer it takes for them to disappear until it could be stuck there for hours. And the problem is that these exiting processes appear to still
    use up one or more of the max_connections, leading to a situation where
    we run out of connections.


    Here's an example of what I'm referring to:

    # ps -ef |grep exiting |wc -l
    6250

    # ps -ef |grep exiting |tail -5
    - 33818386 - - - <exiting>
    - 33949478 - - - <exiting>
    - 34015016 - - - <exiting>
    - 34080578 - - - <exiting>
    - 34211634 - - - <exiting>

    # proctree 33818386 4653070 /apps/pg_10.3/bin/postgres
    33818386 <defunct>


    According to IBM AIX documentation, the exiting/defunct process will
    wait until the parent PID replies that it no longer needs the exit
    status of the subprocess. And so it looks like PostgreSQL may not be
    sending that reply, or is somehow delayed.

    The processes are not zombies yet, they are still dying.

    It seems to be this problem: https://www.postgresql.org/message-id/flat/554a2676-9b2f-7ecc-d675- d52f75b5ef4f%40postgrespro.ru#3cd8f5307c2c1004614bc9fb7a526abd

    Apparently rebuilding PostgreSQL without mmap support can solve the
    problem.

    Yours,
    Laurenz Albe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From inverasln@gmail.com@21:1/5 to All on Wed Mar 20 10:53:07 2019
    I can confirm that the version of postgres we have installed does indeed contain mmap. I can see when using the "dump -T" comamnd in AIX.

    dump -T postgres |egrep "\[Index|mmap"
    [Index] Value Scn IMEX Sclass Type IMPid Name
    [14] 0x00000000 0x0000 0x08 0x0a 0x0 0x0001 mmap

    The question now is to how to get that to be not included if we compile postgres. I'll give the details to our development guys and see if this is something that they can try. Thanks for the suggestion and the direction, as it gives us a starting point.

    SteveN

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Laurenz Albe@21:1/5 to inverasln on Wed Mar 20 18:24:46 2019
    On Wed, 20 Mar 2019 10:53:07 -0700, inverasln wrote:

    The question now is to how to get that to be not included if we compile postgres.

    I have looked into that in some more detail, and here is what you can do:

    - Edit "src/backend/port/sysv_shmem.c" and remove the three lines

    #ifndef EXEC_BACKEND
    #define USE_ANONYMOUS_SHMEM
    #endif

    Then PostgreSQL will be built using System V shared memory.

    - Wait for PostgreSQL v12.

    Commit f1bebef60ec8f557324cd3bfc1671da1318de968 has introduced a
    configuration parameter "shared_memory_type" that you can set to
    "sysv" to use System V shared memory.

    PostgreSQL v12 is due this fall.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)