• How to find out what causes regular SQLSTATE=40003 / SQL1224N

    From Joachim Tuchel@21:1/5 to All on Thu Mar 1 04:41:08 2018
    Hi there,

    we've got this Ubuntu Linux machine (16.04) runnuing DB2 Express C 11.1. We see the following error almost daily, sometimes two to three times a day:


    SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]


    Not every connected user has this problem, others can continue their work and only realize the DB is a bit slower.


    I am not asking you to tell me what the problem is, because I am quite sure it's just not that easy.

    But how would I approach this problem from here? What are steps 1,2,3 etc. to get a better understanding of what is going on? Where to look, what to try, who to ask once I know more about the trouble? Where can I find documentation on how to narrow down
    on problems?

    Things we've checked: the kernel parameters SHMM... look just like what's suggested in the manuals. The machine has lots of unused memory, the DB is far below 1 Gig in size, so it is not one of the easy things like not enaugh ram or such.


    Thanks for tips!

    Joachim

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Troels Arvin@21:1/5 to Joachim Tuchel on Thu Mar 1 20:47:09 2018
    Hello,

    Joachim Tuchel wrote:
    We see the following error almost daily, sometimes two to three times a
    day:
    [...]

    Has it ever worked well? If so: Is there a difference between then and
    when the trouble started?

    It sounds like your DB2 instance restarted. You need to dig into the db2diag.log file to learn more on why DB2 got into trouble. In cases like
    this, you need to set aside quite a bit of time to run through the
    db2diag.log: It can be rather verbose when such things happen.

    How about the operating system's system log (most likely in /var/log/ messages): Does it mention the OOM (out-of-memory) killer stepping in and terminating a DB2-related process?

    Have you configured DB2 to use huge pages? If so, memory handling can be
    harder to get right (but supposedly, performance is also significantly
    better, if your server has lots of memory). If you don't know whether you
    you configured DB2 to use huge pages, then it does not (it requires
    active work from the DBA).


    --
    Regards,
    Troels Arvin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joachim Tuchel@21:1/5 to All on Thu Mar 1 23:55:04 2018
    Troels (assuming that is your first name),


    thanks for these pointers. I cannot actually tell if it worked better in general, because usage of the system has grown significantly over the last months since it was moved onto an Ubuntu 16.04 box.
    The problem had been existing even before that move when it was still DB2 10.5 EXPC on a 14.04 Ubuntu box, but it clearly accours more often these days than a year ago. But don't think we are talking about hundreds of concurrent DB2 connections. There
    are hardly ever more than a maybe 20 open DB2 connections.

    I will check db2diag.log. Most of the times I checked there wasn't much to find on the web. Most of the tips were about the Kernel parameters, and many discussion threads ended with something like "well, didn't actually help".


    I will definitely take a look into the system logs.

    The DB2 installation is pretty much untouched when it comes to configuration, so I'm quite sure there is no huge page support enabled.


    Thanks for your comments and ideas. I have the feeling this is just the beginning of a journey and I'm afraid I will add a few posts to this thread in the coming weeks.


    Joachim













    Am Donnerstag, 1. März 2018 21:48:40 UTC+1 schrieb Troels Arvin:
    Hello,

    Joachim Tuchel wrote:
    We see the following error almost daily, sometimes two to three times a day:
    [...]

    Has it ever worked well? If so: Is there a difference between then and
    when the trouble started?

    It sounds like your DB2 instance restarted. You need to dig into the db2diag.log file to learn more on why DB2 got into trouble. In cases like this, you need to set aside quite a bit of time to run through the db2diag.log: It can be rather verbose when such things happen.

    How about the operating system's system log (most likely in /var/log/ messages): Does it mention the OOM (out-of-memory) killer stepping in and terminating a DB2-related process?

    Have you configured DB2 to use huge pages? If so, memory handling can be harder to get right (but supposedly, performance is also significantly better, if your server has lots of memory). If you don't know whether you you configured DB2 to use huge pages, then it does not (it requires
    active work from the DBA).


    --
    Regards,
    Troels Arvin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mark Barinstein@21:1/5 to Joachim Tuchel on Fri Mar 2 02:41:29 2018
    On Thursday, March 1, 2018 at 3:41:13 PM UTC+3, Joachim Tuchel wrote:
    Hi there,

    we've got this Ubuntu Linux machine (16.04) runnuing DB2 Express C 11.1. We see the following error almost daily, sometimes two to three times a day:


    SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]


    Not every connected user has this problem, others can continue their work and only realize the DB is a bit slower.


    I am not asking you to tell me what the problem is, because I am quite sure it's just not that easy.

    But how would I approach this problem from here? What are steps 1,2,3 etc. to get a better understanding of what is going on? Where to look, what to try, who to ask once I know more about the trouble? Where can I find documentation on how to narrow
    down on problems?

    Things we've checked: the kernel parameters SHMM... look just like what's suggested in the manuals. The machine has lots of unused memory, the DB is far below 1 Gig in size, so it is not one of the easy things like not enaugh ram or such.


    Thanks for tips!

    Joachim

    Hi Joachim,

    sql1224 often means, that the client connection was lost due to various reasons. This could be some network problems, inappropriate db2 client/server communication registry variables settings, manual/automatic 'force application' commands for given
    application handle.

    Check the db2 registry variables available: https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.regvars.doc/doc/r0005660.html

    And db2diag.log contents on the server for the corresponding errors like described here:
    http://www-01.ibm.com/support/docview.wss?uid=swg21613530

    Regards,
    Mark.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Rickard@21:1/5 to Mark Barinstein on Fri Mar 2 12:06:14 2018
    On Friday, 2 March 2018 23:41:35 UTC+13, Mark Barinstein wrote:

    sql1224 often means, that the client connection was lost due to various reasons. This could be some network problems, inappropriate db2 client/server communication registry variables settings, manual/automatic 'force application' commands for given
    application handle.

    Also NUM_LOG_SPAN if set would kill an application connection if a transaction spans the defined number of log extents. However default is 0 (not set).

    Joachim, to cut down on searching you can use commands like:

    db2diag -t 2018-03-01-21.30.59 | more

    ...to view the db2diag.log for the appropriate time, starting just before the connection.

    Probably the most important fact to establish is whether or not the instance crashed. If it's just certain connections being terminated then that suggests a network issue or connection being forced.

    Jeremy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)