• Long-running rollback

    From Troels Arvin@21:1/5 to All on Thu Feb 8 07:31:37 2018
    I have a database session which has been in "Rollback to Savepoint" state
    for a surprisingly long time. As far as I can see, the statement which is
    being rolled back ran for 2½ hours before something happened and it
    started to roll back. Since then, it has been rolling back for more than
    39 hours. Isn't it odd that rolling back takes more than 15 times longer
    than the statement which is being rolled back?

    The database has LOGPRIMARY=10 and LOGSECOND=234.
    I now have 207 .LOG files, amounting to 790 GB. What can I expect happens
    if the rollback continues and I reach 244 log files?

    DB2 version: 11.1.2.2 on x86_64 Linux.

    Excerpts from "get snapshot ..." for the session:

    Application status = Rollback to Savepoint
    Start Time = 02/06/2018 17:33:55.363685
    Completed Work = 232025565071 bytes
    Total Work = 232560263515 bytes
    Status change time = 02/06/2018 17:33:55.360478
    Connection request start timestamp = 02/06/2018 15:08:04.867602
    Connect request completion timestamp = 02/06/2018 15:08:04.868181
    UOW log space used (Bytes) = 740849762132
    Previous UOW completion timestamp = 02/06/2018 15:08:04.868181
    Elapsed t of last completed uow sec.ms = 0.000000
    UOW start timestamp = 02/06/2018 15:08:04.869882
    Statement type = Dynamic SQL Statement
    Statement = Execute Immediate
    Statement start timestamp = 02/06/2018 15:08:04.869882
    Completed Work = 232025565071 bytes
    Total Work = 232560263515 bytes
    Rows inserted = 1348196117
    Rows updated = 0
    Rows selected = 0
    Rows read = 1348196698
    Rows written = 1348196117
    UOW log space used (Bytes) = 740849762132
    Rows read = 1348196698
    Rows written = 1348196117
    Internal rows deleted = 0
    Internal rows updated = 0
    Internal rows inserted = 0
    Rows fetched = 0
    Dynamic SQL statement text:
    INSERT INTO s.something SELECT * FROM s.sourcetab

    --
    Regards,
    Troels Arvin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vvm13ru@gmail.com@21:1/5 to All on Thu Feb 8 06:12:06 2018
    I read that this is normal behavior, not only for DB2, because DBMSes are optimized for commits.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Rickard@21:1/5 to All on Thu Feb 8 21:47:19 2018
    In answer to the final question, the total log space used figure includes space reserved for log compensation records generated by undo of uncommitted transactions. So even if you hit log full there should always be enough reserved space to complete
    rollbacks, if that's what you were asking.

    Everything else will go to pot of course ;-)

    Jeremy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Troels Arvin@21:1/5 to All on Sun Feb 11 11:11:46 2018
    Hello,

    Thanks, Jeremy and vvm13ru@gmail.com.

    The long-running rollback finished after four days, and then the database session closed. That's quite remarkable: 2 hours of work, followed by
    four days of regret.

    After that, the transaction log file system kept stayed highly filled for
    a while, but then after seven hours, the transaction log file system went
    back to its normal state within a few minutes. It's unclear to me why the secondary log files sticked around for seven hours and then suddenly all disappeared.

    --
    Regards,
    Troels Arvin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Troels Arvin@21:1/5 to Jeremy Rickard on Wed Feb 14 22:09:21 2018
    Hello,

    Jeremy Rickard wrote:
    You might review settings like NUM_LOG_SPAN to reduce the chance of such problems repeating.

    Yes, thanks. We have lowered NUM_LOG_SPAN significantly.


    As for the old log files sticking around for a further 7 hours, was
    there any other long running transaction holding a log?

    Sorry, I didn't have a chance to investigate that during the 7 hours.


    Does this database have an HADR standby?

    Nope.

    After the rollback finished, there was something very strange about the
    table being INSERTed into. The table was expected to be empty (because
    the inserting transaction rolled back), confirmed by a statement like
    this which returned no rows:
    SELECT * FROM sname.tabname FETCH FIRST 1 ROWS ONLY WITH UR

    But such a select would run for 11 minutes, before returning the empty
    result set!

    Meanwhile, the following resulted in 1272188937:
    SELECT card
    FROM syscat.tables
    WHERE tabschema='SNAME'
    and tabname='TABNAME'

    IBM Support instructed us in running
    TRUNCATE TABLE sname.tabname IMMEDIATE
    followed by a reorg + runstats of the table. After that, SELECTs in the
    empty table started to return in a split second, as expected.

    Our DB2 version is the very latest, so I suspect we ran into a bug.


    --
    Regards,
    Troels Arvin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Rickard@21:1/5 to Troels Arvin on Tue Feb 13 16:23:14 2018
    On Monday, 12 February 2018 00:11:49 UTC+13, Troels Arvin wrote:
    Hello,

    Thanks, Jeremy and vvm13ru@gmail.com.

    The long-running rollback finished after four days, and then the database session closed. That's quite remarkable: 2 hours of work, followed by
    four days of regret.

    After that, the transaction log file system kept stayed highly filled for
    a while, but then after seven hours, the transaction log file system went back to its normal state within a few minutes. It's unclear to me why the secondary log files sticked around for seven hours and then suddenly all disappeared.

    --
    Regards,
    Troels Arvin

    Hi Troels,

    Glad it finally resolved!

    I'm wondering whether this is that surprising. INSERT SELECT is extremely intensive, even more so if parallelised (but I don't know whether that applied here). In this case the application inserted over 1.3 billion rows and consumed about 750GB of log
    space in the 2 hours before the rollback was triggered.

    You might review settings like NUM_LOG_SPAN to reduce the chance of such problems repeating.

    As for the old log files sticking around for a further 7 hours, was there any other long running transaction holding a log?

    Does this database have an HADR standby?

    Jeremy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Rickard@21:1/5 to All on Thu Feb 15 18:57:39 2018
    The big insert, even though it rolled back will have left many (probably empty) pages for that table. Your SELECT will have performed a table scan of all those empty pages, so I believe that's the reason for the 11 minutes.

    The fact that automatic runstats hadn't updated your table statistics is interesting. Maybe you were ahead of it, or maybe it is never triggered by rollback deletions. If the latter, that could be seen as a bug, or simply a design limitation?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)