On Wednesday, 23 June 2021 at 09:52:25 UTC+10, Derek Ignatius Asirvadem wrote:
On Tuesday, 22 June 2021 at 20:35:05 UTC+10, Nicola wrote:
On 2021-06-19, Derek Ignatius Asirvadem wrote:
______
Client
There is ordinary interplay between the Client and the Server. The client may now be a GUI or a webpage or middleware in an app server.
The VALIDATE block [1] is repeated in the client.
I assume that this means validating data without interacting with the database, e.g., to check that the input data is within valid ranges,
etc.
No.
Any GUI or input system has to do those checks on input fields, that is fundamental, not usually mentioned.
Let’s say there are 20 WIDGETS currently in the inventory:
__ why would you allow an user to purchase 6 WIJJETS [that do not exist] ??? __ Why would you allow an user to purchase 100 WIDGETS [that are not in stock] ???
On Tuesday, 22 June 2021 at 20:35:05 UTC+10, Nicola wrote:
On 2021-06-19, Derek Ignatius Asirvadem wrote:
______
Client
There is ordinary interplay between the Client and the Server. The
client may now be a GUI or a webpage or middleware in an app server.
The VALIDATE block [1] is repeated in the client.I assume that this means validating data without interacting with the database, e.g., to check that the input data is within valid ranges,
etc.
I mean the VALIDATE block [1] is repeated in the client. All of this:The VALIDATE block [1] is repeated in the client.
__________________[Modified Slightly for the GUI Context]
We know we are locking rows, implicitly, by virtue of the SQL commands we issue, but it is the server that does the locking and the resolution.
how do you ensure that
nothing happens between the instant VALIDATE completes and the instant
BEGIN TRAN is executed?
That duration between instants is:
a. not relevant [assuming you understand the above], and
b. microseconds, because it is between two consecutive steps inside
a contiguous stored proc, which is running as object + plan code
(compiled and a full resource plan established), and
That guarantees that interference is unlikely, but not impossible.
Otherwise, you would not need this:
* BEGIN TRAN
• TransactionLog activity commenced for this xact
• Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• ROLLBACK & RETURN on any failure.
c. harmless because no resources are held.That is an interesting approach. It is not obvious, IMO, that validating
the data the way you suggest is advantageous (I am not saying that it is not, mind you!).
Whether it is or not, it seems to depend on a number of
factors, such as number of concurrent transactions, duration of transactions, number of touched resources, etc.
Validation introduces an overhead, but it prevents transactions to start
and acquire locks unless there is a very good chance to acquire them
all, so overall it may be a win.
To summarize, you could implement transactions in at least three ways:
(1) just BEGIN TRAN; INSERT/UPDATE/DELETE; END TRAN:
(2) BEGIN TRAN; SELECT FOR UPDATE; INSERT/UPDATE/DELETE; END TRAN;
(3) VALIDATE, then (2).
And in your experience (3) is the best approach.
OLAP ReportsWhy do you worry about locks in OLAP reports ?
Who cares.
Read locks are extremely short, and transient (held for the duration
of the read operation [Statement], not the Transaction). We can’t control Read
locks, even implicitly
Are you saying that OLAP queries run in a lower isolation level (say,
READ COMMITTED)?
But still, that is not Optimistic Locking. And in this post, there is nothing about Optimistic Locking.
Agreed.
On Tuesday, 22 June 2021 at 20:35:05 UTC+10, Nicola wrote:
On 2021-06-19, Derek Ignatius Asirvadem wrote:
______
Client
There is ordinary interplay between the Client and the Server. The
client may now be a GUI or a webpage or middleware in an app server.
The VALIDATE block [1] is repeated in the client.I assume that this means validating data without interacting with the database, e.g., to check that the input data is within valid ranges,
etc.
I mean /the VALIDATE block [1] is repeated in the client/. All of this, (modified Slightly for the GUI Context):The VALIDATE block [1] is repeated in the client.
__________________
We know we are locking rows, implicitly, by virtue of the SQL commands we issue, but it is the server that does the locking and the resolution.
how do you ensure that
nothing happens between the instant VALIDATE completes and the instant
BEGIN TRAN is executed?
That duration between instants is:
a. not relevant [assuming you understand the above], and
b. microseconds, because it is between two consecutive steps inside
a contiguous stored proc, which is running as object + plan code
(compiled and a full resource plan established), and
That guarantees that interference is unlikely, but not impossible.
Otherwise, you would not need this:
* BEGIN TRAN
• TransactionLog activity commenced for this xact
• Read all rows that must exist, in order for the intended rows (INSERT/UPDATE/DELETE) to succeed.
__• ROLLBACK & RETURN on any failure.
c. harmless because no resources are held.That is an interesting approach. It is not obvious, IMO, that validating
the data the way you suggest is advantageous (I am not saying that it is not, mind you!).
Whether it is or not, it seems to depend on a number of
factors, such as number of concurrent transactions, duration of transactions, number of touched resources, etc.
Validation introduces an overhead, but it prevents transactions to start
and acquire locks unless there is a very good chance to acquire them
all, so overall it may be a win.
To summarize, you could implement transactions in at least three ways:
(1) just BEGIN TRAN; INSERT/UPDATE/DELETE; END TRAN:
(2) BEGIN TRAN; SELECT FOR UPDATE; INSERT/UPDATE/DELETE; END TRAN;
(3) VALIDATE, then (2).
And in your experience (3) is the best approach.
Btw, neither ACID Transaction nor Transaction definition was invented by Jim Gray. No thanks, we had all that long before he was heard of. What he did do was standardise the Transaction Benchmark process, such that the benchmarks performed by the
OLAP ReportsWhy do you worry about locks in OLAP reports ?
Who cares.
Read locks are extremely short, and transient (held for the duration
of the read operation [Statement], not the Transaction). We can’t control Read
locks, even implicitly
Are you saying that OLAP queries run in a lower isolation level (say,
READ COMMITTED)?
But still, that is not Optimistic Locking. And in this post, there is nothing about Optimistic Locking.
Agreed.
OTOH, if you are concerned about data in the database changing between
the point in the VALIDATE block that checks a particular row, and the
point the EXECUTE block that updates the relevant row, then yes, that
is not catered for in the above description. However, that [and one
other condition) is covered by Optimistic Locking.
OLAP ReportsWhy do you worry about locks in OLAP reports ?
Who cares.
Read locks are extremely short, and transient (held for the duration
of the read operation [Statement], not the Transaction). We can’t control Read
locks, even implicitly
Are you saying that OLAP queries run in a lower isolation level (say,
READ COMMITTED)?
Of course. That is what that ISOLATION LEVEL[ READ COMMITTED ] is there for.
Not just OLAP queries, but all SELECTs outside a Transaction. That [ISOLATION LEVEL] is an ANSI SQL requirement, within the ACID
requirement. We exclude [wait for completion of] uncommitted changes,
but we do not need SERIALISED, in fact we want massively parallel, and
we want the whole result set to be integral within itself (all
internal dependencies intact and true).
For understanding. Many systems in the MS world do their OLAP queries
with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
result set has no integrity, and (b) contains uncommitted changes
which may disappear.
For the MVCC world, they have no concept of READ
COMMITTED/UNCOMMITTED, because they have only the false privatised
version of the entire database, firmly secured between their big toes,
which has only nominally “committed” changes, that has no relation to
the version of other users. That nominally “committed” data blows up
at COMMIT TRAN, just check the PooGres manuals re all the problems
that are not supposed to happen in the Stonebraker fantasy, that
happen, even to people who dutifully sing the mantra.
But still, that is not Optimistic Locking. And in this post, there is
nothing about Optimistic Locking.
Agreed.
Add Optimistic Locking, and the methods given in this thread would be complete.
On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
On 2021-06-22, Derek Ignatius Asirvadem wrote:
For understanding. Many systems in the MS world do their OLAP queries
with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
result set has no integrity, and (b) contains uncommitted changes
which may disappear.
Correct. But queries at READ COMMITTED may view (committed) changes that were not there when the transaction started.
Nicola
On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
On 2021-06-22, Derek Ignatius Asirvadem wrote:
Sorry. Before I respond to the content, I need one clarification.
For understanding. Many systems in the MS world do their OLAP queries
with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
result set has no integrity, and (b) contains uncommitted changes
which may disappear.
Correct. But queries at READ COMMITTED may view (committed) changes that
were not there when the transaction started.
What “transaction started” ??? No transaction is involved.
The report connection would normally do SELECT [at] READ_COMMITTED
which holds ReadLocks for the duration of the Statement,
Separately. What “rows that were not there [when the transaction started]” ??? How can a transaction change a row that is not there
???
Or even have knowledge about a row that is not there ???
On 2021-06-24, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
Nicola
On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
On 2021-06-22, Derek Ignatius Asirvadem wrote:
Sorry. Before I respond to the content, I need one clarification.
For understanding. Many systems in the MS world do their OLAP queries >> > with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
result set has no integrity, and (b) contains uncommitted changes
which may disappear.
Correct. But queries at READ COMMITTED may view (committed) changes that >> were not there when the transaction started.
What “transaction started” ??? No transaction is involved.Ok, now that is clear.
The report connection would normally do SELECT [at] READ_COMMITTEDOk. So, I am back to a question I have already posed to you:
which holds ReadLocks for the duration of the Statement,
- OLAP queries run at READ COMMITTED;
- Each OLAP query is a single SELECT statement;
- Each SELECT holds read locks for the duration of the statement.
But the duration of the statement may be significant (minutes, hours,
...) Don't such queries create contention (other transactions cannot
update the data read by the OLAP query until the query is over)?
Separately. What “rows that were not there [when the transaction started]” ??? How can a transaction change a row that is not thereOf course it can't.
???
Or even have knowledge about a row that is not there ???Of course it can't. But it can see something that was not there before,
and it appears at a later time. I explain what I mean with an example:
Time Transaction Operation
0 T1,T2 start transaction;
1 T1 select sum(X) from R;
2 T2 insert into R(X) values (100);
3 T2 commit;
4 T1 select sum(X) from R;
At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
time 2, even if the record did non exist when T1 began. The SELECT at
time 1 and time 4 would then return different results.
Of course, this is not an issue if the only queries you run at READ COMMITTED are single statements—provided that the system guarantees statement-level consistency (it seems that some don't: https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).
Nicola
On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
On 2021-06-24, Derek Ignatius Asirvadem wrote:
The report connection would normally do SELECT [at] READ_COMMITTED
which holds ReadLocks for the duration of the Statement,
Ok. So, I am back to a question I have already posed to you:
- OLAP queries run at READ COMMITTED;
- Each OLAP query is a single SELECT statement;
- Each SELECT holds read locks for the duration of the statement.
But the duration of the statement may be significant
On one occasion I got thrown out of the building by the big boss because his report that ran in 15 mins on Friday, after I re-built the DataStructures on the server, ran in under 1 second on Monday. He was sure that my project failed, that the report
(minutes, hours, ...)
Don't such queries create contention (other transactions cannot
update the data read by the OLAP query until the query is over)?
Separately. What “rows that were not there [when the transaction started]” ??? How can a transaction change a row that is not there
???
Of course it can't.
Or even have knowledge about a row that is not there ???
Of course it can't. But it can see something that was not there before,
and it appears at a later time. I explain what I mean with an example:
Time Transaction Operation
0 T1,T2 start transaction;
1 T1 select sum(X) from R;
2 T2 insert into R(X) values (100);
3 T2 commit;
4 T1 select sum(X) from R;
At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
time 2, even if the record did non exist when T1 began. The SELECT at
time 1 and time 4 would then return different results.
Of course, this is not an issue if the only queries you run at READ COMMITTED are single statements—provided that the system guarantees statement-level consistency
(it seems that some don't: https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).
At read committed level shared locks are released as soon as the data is read (not at the end of the transaction or even the end of the statement)No, at read committed level shared locks are released at the end of the statement.
On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
On 2021-06-24, Derek Ignatius Asirvadem wrote:
The report connection would normally do SELECT [at] READ_COMMITTED
which holds ReadLocks for the duration of the Statement,
Ok. So, I am back to a question I have already posed to you:
- OLAP queries run at READ COMMITTED;
- Each OLAP query is a single SELECT statement;
1. Well, I guarantee that in my databases, because it is 100% Predicates. Even high-end massive report tools do that, noting the intermediate “universe” that maps an RFS into something readable. And then it executes a single SELECT.
2. Not only OLAP, any SELECT runs at READ COMMITTED.
- Each SELECT holds read locks for the duration of the statement.
Yes.
SQL/ACID requirement,
that MV-non-CC suites cannot support.
But the duration of the statement may be significant
Why ???
I don’t have a single report in my entire SQL life that exceeds 12
seconds, and some of them are “complex” (20+ tables; subqueries;
derived tables) and massive (tables have billions of rows). I kinda
pride myself on sub-second response times, even for reports.
(minutes, hours, ...)
Not in the commercial SQL world.
Sure, in idiotic suites of programs, written by 10,000 undergrads
spread across the planet, all singing the Stonebraker mantra to
maintain their collective fantasy, hours and days are “normal”. That
is the price of NOT having an architecture, of writing filth and
pretending that it is a server. 1,000 or 2,000 TIMES slower than
a commercial SQL Platform is “normal”.
So no, the issue simply does not apply to us. But I accept that is “normal” for the academics and freeware users. Just wait until you
get to the benchmark that you were planning.
Take a look at this Benchmark I did against Oracle (MV-non-CC). Where
Sybase returned in 2.16 secs, Oracle had to abandon the benchmark
after 120 MINUTES.
__ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
Don't such queries create contention (other transactions cannot
update the data read by the OLAP query until the query is over)?
Correct.
So what (see above).
No one cares about waiting a few seconds.
1. Did you not read the /Sybase Lock Manager/ link, page 2 table at bottom left ?
2. I have stated that if you want to get into this, there is a Server Monitoring report that just happens to expose this problem, and the considerations, and with more granularity than “contention” ... but
you have not taken it up.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
The Read Locks are precisely this, SELECT queries (not OLAP, but that
makes no difference) contending with OLTP transactions. Do you not
care, that after fixing a SAN Config problem, the Read Locks dropped
by 91%, or what that means at the coal face. Or stated in converse,
why Read Locks were 1,100% higher when the SAN was mis-configured.
Hi Nicola,
Importantly, and often overlooked as far as MVCC disadvantages go, the
data you read in an hours-long query running under MVCC is arbitrarily
old data. If your query takes 2 hours to complete then the very last
row processed by your query is potentially 2 hours out of date.
I have just repeated my experiment. The above query takes ~2.4s with
cold data and ~800ms with hot data on my laptop.
On Saturday, 26 June 2021 at 06:25:28 UTC+10, Nicola wrote:
On 2021-06-24, Daniel Loth wrote:
Hi Daniel,
great to see someone else diving in! I hope you will last :)
But if Derek can keep all his queries under 12 seconds with an old (as
in "mature") 2PC system
2PCNo.
On Saturday, 26 June 2021 at 06:17:37 UTC+10, Nicola wrote:
On 2021-06-24, Derek Ignatius Asirvadem wrote:
On Thursday, 24 June 2021 at 19:33:00 UTC+10, Nicola wrote:
On 2021-06-24, Derek Ignatius Asirvadem wrote:
2. Not only OLAP, any SELECT runs at READ COMMITTED.
Yes. I focused on OLAP to emphasize that I have "complex" queries in
mind.
- Each SELECT holds read locks for the duration of the statement.
Yes.
SQL/ACID requirement,
Fine.
that MV-non-CC suites cannot support.
Trivially, as MVCC does not use locks.
Take a look at this Benchmark I did against Oracle (MV-non-CC). Where Sybase returned in 2.16 secs, Oracle had to abandon the benchmarkI remember finding that benchmark some time ago
after 120 MINUTES.
__ https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Benchmark%20101204.pdf
and trying to
reproduce it in PostgreSQL, based on the DDL I found here:
https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/DDL%20101201.txt
and the query found here:
https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Subquery%20SUM%20ShowPlan.txt
My only noteworthy remarks at the time were:
1. As bad as Oracle might be, it can't be so slow. It sounds like the benchmark was performed by Sybase people and/or no effort was made to optimize on the Oracle side.
Anyway, I don't care about Oracle.
2. 488MB for the 2M records of CustomerTransaction seems quite a lot.
I've got ~147MB, including indexes (well, one index), in PostgreSQL.
I have just repeated my experiment. The above query takes ~2.4s with
cold data and ~800ms with hot data on my laptop. Ok, that's flash
storage and a laptop one decade newer than the one in your document. But it's stock PostgreSQL with default parameters (which are extremely conservative), with no physical tuning applied.
So, all in all it's not
so bad for the price.
For the sake of completeness, and possibly for comparison and further discussion, at the end of this message I report my benchmark.
Don't such queries create contention (other transactions cannot
update the data read by the OLAP query until the query is over)?
Correct.
So what (see above).
No one cares about waiting a few seconds.
There are many applications where waiting more than a few hundreds milliseconds might be problematic.
For example, waiting ten seconds
makes a web app unacceptably slow.
2. I have stated that if you want to get into this, there is a Server Monitoring report that just happens to expose this problem, and the considerations, and with more granularity than “contention” ... but you have not taken it up.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
The Read Locks are precisely this, SELECT queries (not OLAP, but that makes no difference) contending with OLTP transactions. Do you not
care, that after fixing a SAN Config problem, the Read Locks dropped
by 91%, or what that means at the coal face. Or stated in converse,
why Read Locks were 1,100% higher when the SAN was mis-configured.
I have no doubt that, given your experience, you are able to achieve
great improvements on every system you touch.
My questions are for
better understanding on my part, not to tell you that you should run
things differently.
Meaning:Integrity within a ResultSet
But if Derek can keep all his queries under 12 seconds with an old (as
in "mature") 2PC system
If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).
Trivially, as MVCC does not use locks.
Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!
I have just repeated my experiment. The above query takes ~2.4s with
cold data and ~800ms with hot data on my laptop. Ok, that's flash
storage and a laptop one decade newer than the one in your document. But
it's stock PostgreSQL with default parameters (which are extremely
conservative), with no physical tuning applied.
Excellent. The ‘hot’ figure is the one we want (test is logical not physical I/O).
Now if we bring it back to the context of locking vs MV-non-CC, run:
__ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
__ 5 of those SELECTs
__ concurrently.
My questions are for
better understanding on my part, not to tell you that you should run
things differently.
But that is not what I meant, what >>I<< did is not relevant. Given
your [then] concern re Read Locks, and such being held for some
duration, and such contending with OLTP Transactions, you may be
interested in what that actually looks like, in terms of metrics in
a production server.
To proceed with this thread then:
__ generally an understanding of OLTP, that is the LARGE SHARED DATA
BANK context, rather than mere speed in a single-user context,
__ hopefully resolve my declaration that PissGriss does not support ACID
__ (ie. any MV-non-CC system cannot support ACID)
____ and therefore cannot provide OLTP
__ the two items that remain from my side:
___1 there are two major occurrences in OLTP that have not been
mentioned yet, that must be covered before we can consider this
subject complete (MV-non-CC types are totally ignorant about this)
___2 Optimistic Locking. We are dancing around it, without confirming
its absolute need in OLTP.
Although you have no further questions re the Batch Transaction, that understanding is not complete, it can be completed only when the above outstanding elements are completed.
Hi Derek, Nicola,
But if Derek can keep all his queries under 12 seconds with an old (as
in "mature") 2PC system
So having gone through much of the content that Derek has proffered
over time (much of it on StackOverflow), I believe this is mostly
owing to the meticulous normalisation that his databases implement.
Your ordinary database out in industry is often very sloppy.
For example, one of our databases at work has at least one table with
at least 70 columns, and those columns are sparsely populated (that
is, they are nullable and quite often null).
Suppose each of those 70 columns are, on average, 8 bytes each. Each
row is therefore 560 bytes wide.
If we keep it simple and assume that the entire 8KiB of a data page is available for row data then we can fit 14 rows of 560 bytes on each
data page. So that's a low data density to begin with.
Next, consider how many ways in which you might wish to query those 70 columns. There are many permutations, and supporting even a modest
number of those permutations would involve a menagerie of indexes.
On each insert, all of these indexes must be updated.
In the specific case of Postgres, I understand they had a write
amplification issue a number of years ago. This meant that all updates involved updating all indexes, even where those indexes didn't
strictly require an update.
Uber famously blogged about it as their reason for abandoning
Postgres. You can read more about it here: https://eng.uber.com/postgres-to-mysql-migration/
For what it's worth, I accept that this is an implementation detail of Postgres and not something inherent in MVCC. Had they designed their
MVCC in another way, they might have avoided the problem. I couldn't
say with certainty though just how much work they'd have to do to fix
it. Who knows, maybe they have fixed it - Uber's blog post was
published in 2016.
Now let's consider what that index count might look like. In practice,
it's not that unusual to see an un-normalised SQL Server database
table with 10 or so indexes.
If you defer to consultants such as Brent Ozar (of 'Microsoft
Certified Master of SQL Server' fame), they'll tell you that you
should aim for 5 indexes.
Of course, in our table with 70 columns, it's extraordinarily unlikely
that we'll be able to satisfy all of the desirable data access paths
with indexes. So a trade-off is made.
We can do one of two things:
So given the above, I'll contrast it with Derek's typical
implementation and the characteristics as I see them.
Firstly, Derek's implementations are highly normalised. The tables in
his models often comprise a primary key (natural key) and several
attributes. Tables with fewer columns require far fewer indexes to
satisfy the queries that you would perform on them.
In fact, he often mentions that his databases contain no null values.
This is due to him modelling optional attributes in their own tables.
That is, a table that he migrates the key of the parent table into
plus that one additional optional attribute. This table, comprising
a key and one attribute, is in 6NF.
Suppose you have a 'Person' table with a nullable 'MiddleName' column.
And suppose in practice that this table contains a million rows, but
only 10% of Person records contain a non-null value for MiddleName.
When modelled as a single table with nullable column, there are
900,000 records with a null value.
Derek would likely model the relationship like this:
Person -> { PersonKey, Attribute1, ..., AttributeN }
PersonMiddleName -> { PersonKey, MiddleName }
Person would contain 1 million rows that do not comprise a 30 to 50
character MiddleName column. In SQL Server, each nullable column has
2 bytes of bookkeeping information when the column is null. These must
be processed for every select query executed on the table.
PersonMiddleName would contain 100,000 records. You'd only ever query
these records when you specifically require a Person's MiddleName, so
they don't impose a cost when querying the Person table.
The PersonMiddleName table would also only require a single index to
support those queries, that index being the clustered index.
If we scale this idea up to hundreds or even thousands of nullable
columns across a database then they impose a significant cost.
Secondly, and perhaps most importantly, MVCC produces a great deal of
litter. Old versions of a row are retained in pages. In my example
above there are 14 rows per page. This is already very low density
because each row requires so much space.
However those 14 rows might additionally include former versions of
a logically equivalent row (that is, former versions of some row for
a given key).
So if there are 10 versions of a particular row on that data page, and
that data page can hold 14 rows, then the page actually contains at
most 5 distinct rows (1 row versioned 10 times, and at most 4 other
distinct rows).
This means that each page of data processed might provide only minimal utility for that processing cost.
So an MVCC-based system is very likely making less progress with each
page that it reads from disk.
An MVCC-based system is also very likely deriving less utility per GiB
of RAM.
Thirdly is the meticulous transaction control. All transactions live
within the database, implemented as procedures.
I can think of at least two benefits:
1. A database administrator has the complete picture on-hand. They can readily optimise the database as required because they have
a definitive view of all queries and can determine the indexes
required to best satisfy them.
2. Misbehaving client code can't unwittingly start a transaction,
acquire a number of exclusive locks, and then retain those locks for
a prolonged period of time (thus blocking other users of the system).
And well-behaved client code written with the best intentions can't diligently start a transaction, acquire a number of exclusive locks,
and then unwittingly fall prey to CPU scheduler preemption for an
arbitrarily long period of time.
On this matter, I posted a new discussion last night (my time) titled
'Stored procedure structure in RDBMS using Lock Manager for
transaction isolation'.
Fourthly, the databases are modelled and their performance measured
with a whole-of-system mindset.
Many software developers jump immediately to 'de-normalisation' (I've
used the phrase 'un-normalised' above because most of them don't
actually start from a normalised model) because on a micro-level basis
they can write a query that is demonstrably faster for one query.
Of course, that's just one query. What developers often overlook is
the hundreds or thousands of other queries that must co-exist with
this one query.
In my view, this is why MVCC is seen to be working reasonably well by
many in the industry.
Can you elaborate on that? Do you mean that MVCC works well only when
your goal is to micro-optimize, but it is not suitable to obtain
system-wide good performance?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 461 |
Nodes: | 16 (2 / 14) |
Uptime: | 47:08:40 |
Calls: | 9,370 |
Calls today: | 1 |
Files: | 13,545 |
Messages: | 6,086,112 |