Forum: >>> Magnum BBS <<<

Long-running rollback

From Troels Arvin@21:1/5 to All on Thu Feb 8 07:31:37 2018

I have a database session which has been in "Rollback to Savepoint" state
for a surprisingly long time. As far as I can see, the statement which is
being rolled back ran for 2� hours before something happened and it
started to roll back. Since then, it has been rolling back for more than
39 hours. Isn't it odd that rolling back takes more than 15 times longer
than the statement which is being rolled back?

The database has LOGPRIMARY=10 and LOGSECOND=234.
I now have 207 .LOG files, amounting to 790 GB. What can I expect happens
if the rollback continues and I reach 244 log files?

DB2 version: 11.1.2.2 on x86_64 Linux.

Excerpts from "get snapshot ..." for the session:

Application status = Rollback to Savepoint
Start Time = 02/06/2018 17:33:55.363685
Completed Work = 232025565071 bytes
Total Work = 232560263515 bytes
Status change time = 02/06/2018 17:33:55.360478
Connection request start timestamp = 02/06/2018 15:08:04.867602
Connect request completion timestamp = 02/06/2018 15:08:04.868181
UOW log space used (Bytes) = 740849762132
Previous UOW completion timestamp = 02/06/2018 15:08:04.868181
Elapsed t of last completed uow sec.ms = 0.000000
UOW start timestamp = 02/06/2018 15:08:04.869882
Statement type = Dynamic SQL Statement
Statement = Execute Immediate
Statement start timestamp = 02/06/2018 15:08:04.869882
Completed Work = 232025565071 bytes
Total Work = 232560263515 bytes
Rows inserted = 1348196117
Rows updated = 0
Rows selected = 0
Rows read = 1348196698
Rows written = 1348196117
UOW log space used (Bytes) = 740849762132
Rows read = 1348196698
Rows written = 1348196117
Internal rows deleted = 0
Internal rows updated = 0
Internal rows inserted = 0
Rows fetched = 0
Dynamic SQL statement text:
INSERT INTO s.something SELECT * FROM s.sourcetab

--
Regards,
Troels Arvin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From vvm13ru@gmail.com@21:1/5 to All on Thu Feb 8 06:12:06 2018

I read that this is normal behavior, not only for DB2, because DBMSes are optimized for commits.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Rickard@21:1/5 to All on Thu Feb 8 21:47:19 2018

In answer to the final question, the total log space used figure includes space reserved for log compensation records generated by undo of uncommitted transactions. So even if you hit log full there should always be enough reserved space to complete
rollbacks, if that's what you were asking.

Everything else will go to pot of course ;-)

Jeremy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Troels Arvin@21:1/5 to All on Sun Feb 11 11:11:46 2018

Hello,

Thanks, Jeremy and vvm13ru@gmail.com.

The long-running rollback finished after four days, and then the database session closed. That's quite remarkable: 2 hours of work, followed by
four days of regret.

After that, the transaction log file system kept stayed highly filled for
a while, but then after seven hours, the transaction log file system went
back to its normal state within a few minutes. It's unclear to me why the secondary log files sticked around for seven hours and then suddenly all disappeared.

--
Regards,
Troels Arvin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Troels Arvin@21:1/5 to Jeremy Rickard on Wed Feb 14 22:09:21 2018

Hello,

Jeremy Rickard wrote:

You might review settings like NUM_LOG_SPAN to reduce the chance of such problems repeating.

Yes, thanks. We have lowered NUM_LOG_SPAN significantly.

As for the old log files sticking around for a further 7 hours, was
there any other long running transaction holding a log?

Sorry, I didn't have a chance to investigate that during the 7 hours.

Does this database have an HADR standby?

Nope.

After the rollback finished, there was something very strange about the
table being INSERTed into. The table was expected to be empty (because
the inserting transaction rolled back), confirmed by a statement like
this which returned no rows:
SELECT * FROM sname.tabname FETCH FIRST 1 ROWS ONLY WITH UR

But such a select would run for 11 minutes, before returning the empty
result set!

Meanwhile, the following resulted in 1272188937:
SELECT card
FROM syscat.tables
WHERE tabschema='SNAME'
and tabname='TABNAME'

IBM Support instructed us in running
TRUNCATE TABLE sname.tabname IMMEDIATE
followed by a reorg + runstats of the table. After that, SELECTs in the
empty table started to return in a split second, as expected.

Our DB2 version is the very latest, so I suspect we ran into a bug.

--
Regards,
Troels Arvin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Rickard@21:1/5 to Troels Arvin on Tue Feb 13 16:23:14 2018

On Monday, 12 February 2018 00:11:49 UTC+13, Troels Arvin wrote:

Hello,

Thanks, Jeremy and vvm13ru@gmail.com.

The long-running rollback finished after four days, and then the database session closed. That's quite remarkable: 2 hours of work, followed by
four days of regret.

After that, the transaction log file system kept stayed highly filled for
a while, but then after seven hours, the transaction log file system went back to its normal state within a few minutes. It's unclear to me why the secondary log files sticked around for seven hours and then suddenly all disappeared.

--
Regards,
Troels Arvin

Hi Troels,

Glad it finally resolved!

I'm wondering whether this is that surprising. INSERT SELECT is extremely intensive, even more so if parallelised (but I don't know whether that applied here). In this case the application inserted over 1.3 billion rows and consumed about 750GB of log
space in the 2 hours before the rollback was triggered.

You might review settings like NUM_LOG_SPAN to reduce the chance of such problems repeating.

As for the old log files sticking around for a further 7 hours, was there any other long running transaction holding a log?

Does this database have an HADR standby?

Jeremy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Rickard@21:1/5 to All on Thu Feb 15 18:57:39 2018

The big insert, even though it rolled back will have left many (probably empty) pages for that table. Your SELECT will have performed a table scan of all those empty pages, so I believe that's the reason for the 11 minutes.

The fact that automatic runstats hadn't updated your table statistics is interesting. Maybe you were ahead of it, or maybe it is never triggered by rollback deletions. If the latter, that could be seen as a bug, or simply a design limitation?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	236:11:43
Calls:	6,624
Files:	12,172
Messages:	5,319,828

Long-running rollback

Who's Online

System Info