• DB2 Express-C 10.5 on Ubuntu 16.04: Frequent crashes

    From Joachim Tuchel@21:1/5 to All on Thu Nov 23 04:45:34 2017
    We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors

    [SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=
    -1224]

    And after a few attempts clients get

    [SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]

    And a few attempts later they get "No Start Database Manager command was issued"

    I can simply do a db2 start dbm and things start to work again, but I never know for how long.


    I've tried to find some useful information in the db2diag.log. The first error is usually this:

    EDUID : 111 EDUNAME: db2agent (instance)
    FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
    DATA #1 : Hexdump, 24 bytes
    0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
    0x000000020457E794 : 0200 0000 FBFF FFFF ........

    2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
    PID : 1633 TID : 140263188064000 PROC : db2sysc
    INSTANCE: NODE : 000 DB :
    APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
    AUTHID : HOSTNAME:
    EDUID : 81 EDUNAME: db2fw0 (KONTO1)
    FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
    MESSAGE : ZRC=0x8300002B=-2097151957


    Followed by hundreds of other errors, like for example this one:

    MESSAGE : Unexpected OS error. This most likely means that resources have been
    torn down from underneath the prefetcher. Terminating the prefetcher
    to prevent infinite looping.
    CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
    [0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
    [1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
    [2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
    [3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
    [4] 0x00007F91AB880181 sqloEDUEntry + 0x301
    [5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
    [6] 0x00007F91A650F3DD clone + 0x6D




    So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.

    Any hints?


    Joachim

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Rickard@21:1/5 to Joachim Tuchel on Mon Nov 27 11:02:47 2017
    Google "sqloWaitEDUWaitPost" for a few possibilities?

    Might not hurt to check the ulimit values against recommendations at https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0052441.html

    Is it a root or non-root installation?

    Before you do the new 11.1 install suggest you also run db2prereqcheck, e.g. ./db2prereqcheck -v 11.1.2.2

    On Friday, 24 November 2017 01:45:36 UTC+13, Joachim Tuchel wrote:
    We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors

    [SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native
    Error=-1224]

    And after a few attempts clients get

    [SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]

    And a few attempts later they get "No Start Database Manager command was issued"

    I can simply do a db2 start dbm and things start to work again, but I never know for how long.


    I've tried to find some useful information in the db2diag.log. The first error is usually this:

    EDUID : 111 EDUNAME: db2agent (instance)
    FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
    DATA #1 : Hexdump, 24 bytes
    0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
    0x000000020457E794 : 0200 0000 FBFF FFFF ........

    2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
    PID : 1633 TID : 140263188064000 PROC : db2sysc
    INSTANCE: NODE : 000 DB :
    APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
    AUTHID : HOSTNAME:
    EDUID : 81 EDUNAME: db2fw0 (KONTO1)
    FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100 MESSAGE : ZRC=0x8300002B=-2097151957


    Followed by hundreds of other errors, like for example this one:

    MESSAGE : Unexpected OS error. This most likely means that resources have been
    torn down from underneath the prefetcher. Terminating the prefetcher
    to prevent infinite looping.
    CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
    [0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
    [1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
    [2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
    [3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
    [4] 0x00007F91AB880181 sqloEDUEntry + 0x301
    [5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
    [6] 0x00007F91A650F3DD clone + 0x6D




    So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.

    Any hints?


    Joachim

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joachim Tuchel@21:1/5 to All on Tue Nov 28 10:58:23 2017
    Oh, and it is a root installation...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joachim Tuchel@21:1/5 to All on Tue Nov 28 10:57:32 2017
    Jeremy,


    I tried googling a few expressions, but not sqloWaitEDUWaitPost... Will do so tonight. Thanks for the suggestion.

    The ulimit values seem fine.

    I did the db2prereqcheck for 11.1 and it has nothing to complain about... In the meantime, I have upgraded to 11.1 in the hopes the situation changes. The upgrade process wasn't as smooth as expected, but all is up and running now. It's too early to tell
    if 11.1 magically removed the crashes...


    Joachim



    Am Montag, 27. November 2017 20:02:49 UTC+1 schrieb Jeremy Rickard:
    Google "sqloWaitEDUWaitPost" for a few possibilities?

    Might not hurt to check the ulimit values against recommendations at https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0052441.html

    Is it a root or non-root installation?

    Before you do the new 11.1 install suggest you also run db2prereqcheck, e.g. ./db2prereqcheck -v 11.1.2.2

    On Friday, 24 November 2017 01:45:36 UTC+13, Joachim Tuchel wrote:
    We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors

    [SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native
    Error=-1224]

    And after a few attempts clients get

    [SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]

    And a few attempts later they get "No Start Database Manager command was issued"

    I can simply do a db2 start dbm and things start to work again, but I never know for how long.


    I've tried to find some useful information in the db2diag.log. The first error is usually this:

    EDUID : 111 EDUNAME: db2agent (instance)
    FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2 DATA #1 : Hexdump, 24 bytes
    0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
    0x000000020457E794 : 0200 0000 FBFF FFFF ........

    2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
    PID : 1633 TID : 140263188064000 PROC : db2sysc INSTANCE: NODE : 000 DB :
    APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
    AUTHID : HOSTNAME:
    EDUID : 81 EDUNAME: db2fw0 (KONTO1)
    FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100 MESSAGE : ZRC=0x8300002B=-2097151957


    Followed by hundreds of other errors, like for example this one:

    MESSAGE : Unexpected OS error. This most likely means that resources have been
    torn down from underneath the prefetcher. Terminating the prefetcher
    to prevent infinite looping.
    CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
    [0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
    [1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
    [2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
    [3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
    [4] 0x00007F91AB880181 sqloEDUEntry + 0x301
    [5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
    [6] 0x00007F91A650F3DD clone + 0x6D




    So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.

    Any hints?


    Joachim

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)