We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications ErrorsError=-1224]
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
I've tried to find some useful information in the db2diag.log. The first error is usually this:
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
INSTANCE: NODE : 000 DB :
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
AUTHID : HOSTNAME:
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100 MESSAGE : ZRC=0x8300002B=-2097151957
Followed by hundreds of other errors, like for example this one:
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim
Google "sqloWaitEDUWaitPost" for a few possibilities?Error=-1224]
Might not hurt to check the ulimit values against recommendations at https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0052441.html
Is it a root or non-root installation?
Before you do the new 11.1 install suggest you also run db2prereqcheck, e.g. ./db2prereqcheck -v 11.1.2.2
On Friday, 24 November 2017 01:45:36 UTC+13, Joachim Tuchel wrote:
We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
I've tried to find some useful information in the db2diag.log. The first error is usually this:
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2 DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc INSTANCE: NODE : 000 DB :
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
AUTHID : HOSTNAME:
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100 MESSAGE : ZRC=0x8300002B=-2097151957
Followed by hundreds of other errors, like for example this one:
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 87:29:26 |
Calls: | 6,496 |
Calls today: | 7 |
Files: | 12,100 |
Messages: | 5,277,163 |