Forum: >>> Magnum BBS <<<

UPSERT with non-unique index

From Stanimir Stamenkov@21:1/5 to All on Sun Aug 2 19:55:42 2020

[I've posted this on Stack Overflow: https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index]

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so I'm performing:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

UPDATE table
SET col1 = col1 + 1,
col2 = MAX(col2, 2)
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated
(col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can see
the second session blocking until the first one commits its transaction,
but then the second transaction is also able to insert a new row
sometimes (or sometimes it achieves the expected of avoiding to insert a duplicate (col3, col4, col5) row).

I'm currently performing manual unique-check after the insert:

SELECT COUNT(1)
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

but I've also tried:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'
FOR UPDATE)

which appears to work with the examples I'm always getting a duplicate
(col3, col4, col5) row, otherwise. Is the given FOR UPDATE usage
reliable for the purpose of ensuring no duplicate (col3, col4, col5) row
will be inserted?

I'm using READ-COMMITTED transaction isolation.

[1] https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

--
Stanimir

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Stanimir Stamenkov on Sun Aug 2 20:15:03 2020

On 02/08/2020 18.55, Stanimir Stamenkov wrote:

[I've posted this on Stack Overflow: https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index]

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so I'm performing:

    INSERT INTO table (col1, col2, col3, col4, col5)
    SELECT 1, 2, 'value3', 'value4', 'value5'
     WHERE NOT EXISTS (SELECT 1
                         FROM table
                        WHERE col3 = 'value3'
                          AND col4 = 'value4'
                          AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

    UPDATE table
       SET col1 = col1 + 1,
           col2 = MAX(col2, 2)
     WHERE col3 = 'value3'
       AND col4 = 'value4'
       AND col5 = 'value5'

There's an index:

    CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated (col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can see
the second session blocking until the first one commits its transaction,
but then the second transaction is also able to insert a new row
sometimes (or sometimes it achieves the expected of avoiding to insert a duplicate (col3, col4, col5) row).

Maybe the blocking is more then way you are testing things than what in
reality happens.

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Schwenke@21:1/5 to Stanimir Stamenkov on Sun Aug 2 22:39:04 2020

On 02.08.2020 18:55, Stanimir Stamenkov wrote:

I need to implement concurrent-safe UPSERT

...

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me

...

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to declare it unique.

That is the point where I would solve it.

1. add a new column `is_legacy_data`, i.e. TINYINT DEFAULT 0 that will be 0
for all new columns

2. for legacy columns with (col3, col4, col5) being not unique, set the new column to a value that makes (col3, col4, col5, `is_legacy_data`) unique

3. add the unique index on (col3, col4, col5, `is_legacy_data`)

4. be happy with INSERT ... ON DUPLICATE KEY ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to The Natural Philosopher on Sun Aug 2 17:46:39 2020

On August 2, 2020 17:28, The Natural Philosopher wrote:

On 02/08/2020 19:15, J.O. Aho wrote:

On 02/08/2020 18.55, Stanimir Stamenkov wrote:

[I've posted this on Stack Overflow:
https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>>

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
I'm performing:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

UPDATE table
SET col1 = col1 + 1,
col2 = MAX(col2, 2)
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated
(col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can
see the second session blocking until the first one commits its
transaction, but then the second transaction is also able to insert a
new row sometimes (or sometimes it achieves the expected of avoiding
to insert a duplicate (col3, col4, col5) row).

Maybe the blocking is more then way you are testing things than what in
reality happens.

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

I don't know how SQL manages it, but if you have two asynchronous
instances of read-modify-write, where what is written depends on what is read, then you MUST make the read-modify-write cycle ATOMIC - that is uninterruptible, by another process doing the same.

You need locking, and the attendant danger that if aprocess crashes with locks set...

I'd roll the whole thing in a transaction

START TRANSACTION;

-- your sql goes here

COMMIT;

--
Lew Pitcher
"In Skills, We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Natural Philosopher@21:1/5 to J.O. Aho on Sun Aug 2 22:28:37 2020

On 02/08/2020 19:15, J.O. Aho wrote:

On 02/08/2020 18.55, Stanimir Stamenkov wrote:

[I've posted this on Stack Overflow:
https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
I'm performing:

     INSERT INTO table (col1, col2, col3, col4, col5)
     SELECT 1, 2, 'value3', 'value4', 'value5'
      WHERE NOT EXISTS (SELECT 1
                          FROM table
                         WHERE col3 = 'value3'
                           AND col4 = 'value4'
                           AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

     UPDATE table
        SET col1 = col1 + 1,
            col2 = MAX(col2, 2)
      WHERE col3 = 'value3'
        AND col4 = 'value4'
        AND col5 = 'value5'

There's an index:

     CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated
(col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can
see the second session blocking until the first one commits its
transaction, but then the second transaction is also able to insert a
new row sometimes (or sometimes it achieves the expected of avoiding
to insert a duplicate (col3, col4, col5) row).

Maybe the blocking is more then way you are testing things than what in reality happens.

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

I don't know how SQL manages it, but if you have two asynchronous
instances of read-modify-write, where what is written depends on what is
read, then you MUST make the read-modify-write cycle ATOMIC - that is uninterruptible, by another process doing the same.

You need locking, and the attendant danger that if aprocess crashes with
locks set...

--
The biggest threat to humanity comes from socialism, which has utterly
diverted our attention away from what really matters to our existential survival, to indulging in navel gazing and faux moral investigations
into what the world ought to be, whilst we fail utterly to deal with
what it actually is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Natural Philosopher@21:1/5 to Lew Pitcher on Mon Aug 3 10:05:05 2020

On 02/08/2020 22:46, Lew Pitcher wrote:

On August 2, 2020 17:28, The Natural Philosopher wrote:

On 02/08/2020 19:15, J.O. Aho wrote:

On 02/08/2020 18.55, Stanimir Stamenkov wrote:

[I've posted this on Stack Overflow:
https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>>>

I need to implement concurrent-safe UPSERT using a non-unique key and
avoid unnecessary auto-increment of ID.

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
I'm performing:

INSERT INTO table (col1, col2, col3, col4, col5)
SELECT 1, 2, 'value3', 'value4', 'value5'
WHERE NOT EXISTS (SELECT 1
FROM table
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5')

then if it results in no row inserted, I'm performing:

UPDATE table
SET col1 = col1 + 1,
col2 = MAX(col2, 2)
WHERE col3 = 'value3'
AND col4 = 'value4'
AND col5 = 'value5'

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to
declare it unique. Newer records, however, should not have duplicated >>>> (col3, col4, col5) rows.

Unsurprisingly, using the given INSERT statement I'm getting mixed
results trying to execute it concurrently from two sessions. I can
see the second session blocking until the first one commits its
transaction, but then the second transaction is also able to insert a
new row sometimes (or sometimes it achieves the expected of avoiding
to insert a duplicate (col3, col4, col5) row).

Maybe the blocking is more then way you are testing things than what in
reality happens.

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

I don't know how SQL manages it, but if you have two asynchronous
instances of read-modify-write, where what is written depends on what is
read, then you MUST make the read-modify-write cycle ATOMIC - that is
uninterruptible, by another process doing the same.

You need locking, and the attendant danger that if aprocess crashes with
locks set...

I'd roll the whole thing in a transaction

START TRANSACTION;

-- your sql goes here

COMMIT;

I don't use SQL often enough to remember that it has transactions these
days :-(

Note that you appear to need to be using innoDB or another advanced
engine and not MyISAM if you want to use transactions

--
"In our post-modern world, climate science is not powerful because it is
true: it is true because it is powerful."

Lucas Bergkamp

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to The Natural Philosopher on Mon Aug 3 11:38:14 2020

On 03/08/2020 11.05, The Natural Philosopher wrote:

On 02/08/2020 22:46, Lew Pitcher wrote:

I'd roll the whole thing in a transaction

START TRANSACTION;

-- your sql goes here

COMMIT;

I don't use SQL often enough to remember that it has transactions these
days :-(

Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.

Note that you appear to need to be using innoDB or another advanced
engine and not MyISAM if you want to use transactions

Yes, you need a db-engine like InnoDB or XtraDB which both support xa transactions.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Natural Philosopher@21:1/5 to J.O. Aho on Mon Aug 3 16:30:27 2020

On 03/08/2020 10:38, J.O. Aho wrote:

On 03/08/2020 11.05, The Natural Philosopher wrote:

On 02/08/2020 22:46, Lew Pitcher wrote:

I'd roll the whole thing in a transaction

START TRANSACTION;

-- your sql goes here

COMMIT;

I don't use SQL often enough to remember that it has transactions
these days :-(

Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.

At my time of life that seems like yesterday...That was about when I was needing to use them too. I remember looking at them and deciding it
wasn't necessary for that application. And it was all legacy ISAM files
as well.

Note that you appear to need to be using innoDB or another advanced
engine and not MyISAM if you want to use transactions

Yes, you need a db-engine like InnoDB or XtraDB which both support xa transactions.

--
Canada is all right really, though not for the whole weekend.

"Saki"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stanimir Stamenkov@21:1/5 to All on Mon Aug 3 20:29:27 2020

Sun, 2 Aug 2020 22:39:04 +0200, /Axel Schwenke/:

On 02.08.2020 18:55, Stanimir Stamenkov wrote:

I need to implement concurrent-safe UPSERT

...

Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me

...

There's an index:

CREATE INDEX ON table (col3, col4, col5)

It is non-unique as there are legacy data that does not allow me to declare >> it unique.

That is the point where I would solve it.

1. add a new column `is_legacy_data`, i.e. TINYINT DEFAULT 0 that will be 0 for all new columns

2. for legacy columns with (col3, col4, col5) being not unique, set the new column to a value that makes (col3, col4, col5, `is_legacy_data`) unique

3. add the unique index on (col3, col4, col5, `is_legacy_data`)

The current setup is similar, col5 is a new column which has a default
value for applications which don't know about it. Can't think of an
easy way to make it unique and recognize the data is from an older
application this way. There will be a transitional period where old and
new version applications will be pumping data into the table.

At the end, I'll be normalizing the data post application upgrade to
ensure all (col3, col4, col5) are unique and finally make the index
unique, but wanted to have some insurance no unnecessary duplicates may
appear in the meantime.

4. be happy with INSERT ... ON DUPLICATE KEY ...

Insert attempts which will end up updates are too many to waste sequence
keys for me.

--
Stanimir

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stanimir Stamenkov@21:1/5 to All on Mon Aug 3 20:45:03 2020

Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

Wouldn't this basically block concurrent inserts/updates? I don't want
to block all on the same table trying to insert/update different sets of
data.

--
Stanimir

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Natural Philosopher@21:1/5 to Stanimir Stamenkov on Mon Aug 3 19:09:45 2020

On 03/08/2020 18:45, Stanimir Stamenkov wrote:

Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

Wouldn't this basically block concurrent inserts/updates? I don't want
to block all on the same table trying to insert/update different sets of data.

it wont be for long...

--
“People believe certain stories because everyone important tells them,
and people tell those stories because everyone important believes them.
Indeed, when a conventional wisdom is at its fullest strength, one’s agreement with that conventional wisdom becomes almost a litmus test of
one’s suitability to be taken seriously.”

Paul Krugman

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Stanimir Stamenkov on Mon Aug 3 21:55:10 2020

On 03/08/2020 19.45, Stanimir Stamenkov wrote:

Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

I would suggest you use

LOCK TABLES table WRITE;

<do what you did before>

UNLOCK TABLES;

LOCK TABLES table WRITE;

UNLOCK TABLES;

https://www.mysqltutorial.org/mysql-table-locking/
https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

Wouldn't this basically block concurrent inserts/updates? I don't want
to block all on the same table trying to insert/update different sets of data.

Yes, it will, the good thing (or sometimes bad) is that the others will
be waiting for the lock to be released and then do their task.

Unless you do a lot in the transaction, the time the table is locked
tends to be a lot. If the case is that you do a lot, then I would
recommend you to break out things, so that the only thing you will do is
the check for existing values and based on that do the update or insert.
It could be good to make an insert SP that takes care of everything
(begin transaction (if innodb),locking table, check if key already
exists and do update else insert, unlock table, commit transaction (if innodb)).
See https://dev.mysql.com/doc/refman/5.7/en/create-procedure.html

Without the begin transaction/commit transaction, there is a risk for deadlocks, which means that the table never gets unlocked (could happen
if the locking thread unexpectedly dies) and everyone else who wait for
access the table will keep on waiting forever.

Other alternative is to move the old data that can have more than one
row with the same key to a history table and in the current table you
make the key to a primary key. This will complicate things when you need
to fetch data, you could of course be able to make a view that selects
with a union from both tables, this will make it easier when you want to
join things, but keep in mind it may slow things down.
Alternative you have a switch telling you if you need current data or historical data, if you need the historical data, then use the history
table otherwise the current table.
As I don't know the needs, I can't give you a best option, but some
options you can look at and then decide yourself which works for you.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	236:47:33
Calls:	6,624
Files:	12,172
Messages:	5,319,838

UPSERT with non-unique index

Who's Online

System Info