• UPSERT with non-unique index

    From Stanimir Stamenkov@21:1/5 to All on Sun Aug 2 19:55:42 2020
    [I've posted this on Stack Overflow: https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index]

    I need to implement concurrent-safe UPSERT using a non-unique key and
    avoid unnecessary auto-increment of ID.

    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so I'm performing:

    INSERT INTO table (col1, col2, col3, col4, col5)
    SELECT 1, 2, 'value3', 'value4', 'value5'
    WHERE NOT EXISTS (SELECT 1
    FROM table
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5')

    then if it results in no row inserted, I'm performing:

    UPDATE table
    SET col1 = col1 + 1,
    col2 = MAX(col2, 2)
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5'

    There's an index:

    CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to
    declare it unique. Newer records, however, should not have duplicated
    (col3, col4, col5) rows.

    Unsurprisingly, using the given INSERT statement I'm getting mixed
    results trying to execute it concurrently from two sessions. I can see
    the second session blocking until the first one commits its transaction,
    but then the second transaction is also able to insert a new row
    sometimes (or sometimes it achieves the expected of avoiding to insert a duplicate (col3, col4, col5) row).

    I'm currently performing manual unique-check after the insert:

    SELECT COUNT(1)
    FROM table
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5'

    but I've also tried:

    INSERT INTO table (col1, col2, col3, col4, col5)
    SELECT 1, 2, 'value3', 'value4', 'value5'
    WHERE NOT EXISTS (SELECT 1
    FROM table
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5'
    FOR UPDATE)

    which appears to work with the examples I'm always getting a duplicate
    (col3, col4, col5) row, otherwise. Is the given FOR UPDATE usage
    reliable for the purpose of ensuring no duplicate (col3, col4, col5) row
    will be inserted?

    I'm using READ-COMMITTED transaction isolation.

    [1] https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

    --
    Stanimir

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Stanimir Stamenkov on Sun Aug 2 20:15:03 2020
    On 02/08/2020 18.55, Stanimir Stamenkov wrote:
    [I've posted this on Stack Overflow: https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index]

    I need to implement concurrent-safe UPSERT using a non-unique key and
    avoid unnecessary auto-increment of ID.

    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so I'm performing:

        INSERT INTO table (col1, col2, col3, col4, col5)
        SELECT 1, 2, 'value3', 'value4', 'value5'
         WHERE NOT EXISTS (SELECT 1
                             FROM table
                            WHERE col3 = 'value3'
                              AND col4 = 'value4'
                              AND col5 = 'value5')

    then if it results in no row inserted, I'm performing:

        UPDATE table
           SET col1 = col1 + 1,
               col2 = MAX(col2, 2)
         WHERE col3 = 'value3'
           AND col4 = 'value4'
           AND col5 = 'value5'

    There's an index:

        CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to
    declare it unique.  Newer records, however, should not have duplicated (col3, col4, col5) rows.

    Unsurprisingly, using the given INSERT statement I'm getting mixed
    results trying to execute it concurrently from two sessions.  I can see
    the second session blocking until the first one commits its transaction,
    but then the second transaction is also able to insert a new row
    sometimes (or sometimes it achieves the expected of avoiding to insert a duplicate (col3, col4, col5) row).

    Maybe the blocking is more then way you are testing things than what in
    reality happens.

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Schwenke@21:1/5 to Stanimir Stamenkov on Sun Aug 2 22:39:04 2020
    On 02.08.2020 18:55, Stanimir Stamenkov wrote:

    I need to implement concurrent-safe UPSERT
    ...
    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me
    ...
    There's an index:

        CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to declare it unique.

    That is the point where I would solve it.

    1. add a new column `is_legacy_data`, i.e. TINYINT DEFAULT 0 that will be 0
    for all new columns

    2. for legacy columns with (col3, col4, col5) being not unique, set the new column to a value that makes (col3, col4, col5, `is_legacy_data`) unique

    3. add the unique index on (col3, col4, col5, `is_legacy_data`)

    4. be happy with INSERT ... ON DUPLICATE KEY ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to The Natural Philosopher on Sun Aug 2 17:46:39 2020
    On August 2, 2020 17:28, The Natural Philosopher wrote:

    On 02/08/2020 19:15, J.O. Aho wrote:
    On 02/08/2020 18.55, Stanimir Stamenkov wrote:
    [I've posted this on Stack Overflow:
    https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>>

    I need to implement concurrent-safe UPSERT using a non-unique key and
    avoid unnecessary auto-increment of ID.

    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
    I'm performing:

    INSERT INTO table (col1, col2, col3, col4, col5)
    SELECT 1, 2, 'value3', 'value4', 'value5'
    WHERE NOT EXISTS (SELECT 1
    FROM table
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5')

    then if it results in no row inserted, I'm performing:

    UPDATE table
    SET col1 = col1 + 1,
    col2 = MAX(col2, 2)
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5'

    There's an index:

    CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to
    declare it unique. Newer records, however, should not have duplicated
    (col3, col4, col5) rows.

    Unsurprisingly, using the given INSERT statement I'm getting mixed
    results trying to execute it concurrently from two sessions. I can
    see the second session blocking until the first one commits its
    transaction, but then the second transaction is also able to insert a
    new row sometimes (or sometimes it achieves the expected of avoiding
    to insert a duplicate (col3, col4, col5) row).

    Maybe the blocking is more then way you are testing things than what in
    reality happens.

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/
    https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    I don't know how SQL manages it, but if you have two asynchronous
    instances of read-modify-write, where what is written depends on what is read, then you MUST make the read-modify-write cycle ATOMIC - that is uninterruptible, by another process doing the same.

    You need locking, and the attendant danger that if aprocess crashes with locks set...

    I'd roll the whole thing in a transaction

    START TRANSACTION;

    -- your sql goes here

    COMMIT;


    --
    Lew Pitcher
    "In Skills, We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Natural Philosopher@21:1/5 to J.O. Aho on Sun Aug 2 22:28:37 2020
    On 02/08/2020 19:15, J.O. Aho wrote:
    On 02/08/2020 18.55, Stanimir Stamenkov wrote:
    [I've posted this on Stack Overflow:
    https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>

    I need to implement concurrent-safe UPSERT using a non-unique key and
    avoid unnecessary auto-increment of ID.

    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
    I'm performing:

         INSERT INTO table (col1, col2, col3, col4, col5)
         SELECT 1, 2, 'value3', 'value4', 'value5'
          WHERE NOT EXISTS (SELECT 1
                              FROM table
                             WHERE col3 = 'value3'
                               AND col4 = 'value4'
                               AND col5 = 'value5')

    then if it results in no row inserted, I'm performing:

         UPDATE table
            SET col1 = col1 + 1,
                col2 = MAX(col2, 2)
          WHERE col3 = 'value3'
            AND col4 = 'value4'
            AND col5 = 'value5'

    There's an index:

         CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to
    declare it unique.  Newer records, however, should not have duplicated
    (col3, col4, col5) rows.

    Unsurprisingly, using the given INSERT statement I'm getting mixed
    results trying to execute it concurrently from two sessions.  I can
    see the second session blocking until the first one commits its
    transaction, but then the second transaction is also able to insert a
    new row sometimes (or sometimes it achieves the expected of avoiding
    to insert a duplicate (col3, col4, col5) row).

    Maybe the blocking is more then way you are testing things than what in reality happens.

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    I don't know how SQL manages it, but if you have two asynchronous
    instances of read-modify-write, where what is written depends on what is
    read, then you MUST make the read-modify-write cycle ATOMIC - that is uninterruptible, by another process doing the same.

    You need locking, and the attendant danger that if aprocess crashes with
    locks set...

    --
    The biggest threat to humanity comes from socialism, which has utterly
    diverted our attention away from what really matters to our existential survival, to indulging in navel gazing and faux moral investigations
    into what the world ought to be, whilst we fail utterly to deal with
    what it actually is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Natural Philosopher@21:1/5 to Lew Pitcher on Mon Aug 3 10:05:05 2020
    On 02/08/2020 22:46, Lew Pitcher wrote:
    On August 2, 2020 17:28, The Natural Philosopher wrote:

    On 02/08/2020 19:15, J.O. Aho wrote:
    On 02/08/2020 18.55, Stanimir Stamenkov wrote:
    [I've posted this on Stack Overflow:
    https://stackoverflow.com/questions/63194116/upsert-with-non-unique-index] >>>>

    I need to implement concurrent-safe UPSERT using a non-unique key and
    avoid unnecessary auto-increment of ID.

    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me, so
    I'm performing:

    INSERT INTO table (col1, col2, col3, col4, col5)
    SELECT 1, 2, 'value3', 'value4', 'value5'
    WHERE NOT EXISTS (SELECT 1
    FROM table
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5')

    then if it results in no row inserted, I'm performing:

    UPDATE table
    SET col1 = col1 + 1,
    col2 = MAX(col2, 2)
    WHERE col3 = 'value3'
    AND col4 = 'value4'
    AND col5 = 'value5'

    There's an index:

    CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to
    declare it unique. Newer records, however, should not have duplicated >>>> (col3, col4, col5) rows.

    Unsurprisingly, using the given INSERT statement I'm getting mixed
    results trying to execute it concurrently from two sessions. I can
    see the second session blocking until the first one commits its
    transaction, but then the second transaction is also able to insert a
    new row sometimes (or sometimes it achieves the expected of avoiding
    to insert a duplicate (col3, col4, col5) row).

    Maybe the blocking is more then way you are testing things than what in
    reality happens.

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/
    https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    I don't know how SQL manages it, but if you have two asynchronous
    instances of read-modify-write, where what is written depends on what is
    read, then you MUST make the read-modify-write cycle ATOMIC - that is
    uninterruptible, by another process doing the same.

    You need locking, and the attendant danger that if aprocess crashes with
    locks set...

    I'd roll the whole thing in a transaction

    START TRANSACTION;

    -- your sql goes here

    COMMIT;


    I don't use SQL often enough to remember that it has transactions these
    days :-(

    Note that you appear to need to be using innoDB or another advanced
    engine and not MyISAM if you want to use transactions


    --
    "In our post-modern world, climate science is not powerful because it is
    true: it is true because it is powerful."

    Lucas Bergkamp

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to The Natural Philosopher on Mon Aug 3 11:38:14 2020
    On 03/08/2020 11.05, The Natural Philosopher wrote:
    On 02/08/2020 22:46, Lew Pitcher wrote:

    I'd roll the whole thing in a transaction

    START TRANSACTION;

    -- your sql goes here

    COMMIT;


    I don't use SQL often enough to remember that it has transactions these
    days :-(

    Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.


    Note that you appear to need to be using innoDB or another advanced
    engine and not MyISAM if you want to use transactions

    Yes, you need a db-engine like InnoDB or XtraDB which both support xa transactions.

    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Natural Philosopher@21:1/5 to J.O. Aho on Mon Aug 3 16:30:27 2020
    On 03/08/2020 10:38, J.O. Aho wrote:
    On 03/08/2020 11.05, The Natural Philosopher wrote:
    On 02/08/2020 22:46, Lew Pitcher wrote:

    I'd roll the whole thing in a transaction

    START TRANSACTION;

    -- your sql goes here

    COMMIT;


    I don't use SQL often enough to remember that it has transactions
    these days :-(

    Yeah, it's quite recently they introduced xa transactions in MySQL, 2005.

    At my time of life that seems like yesterday...That was about when I was needing to use them too. I remember looking at them and deciding it
    wasn't necessary for that application. And it was all legacy ISAM files
    as well.

    Note that you appear to need to be using innoDB or another advanced
    engine and not MyISAM if you want to use transactions

    Yes, you need a db-engine like InnoDB or XtraDB which both support xa transactions.



    --
    Canada is all right really, though not for the whole weekend.

    "Saki"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stanimir Stamenkov@21:1/5 to All on Mon Aug 3 20:29:27 2020
    Sun, 2 Aug 2020 22:39:04 +0200, /Axel Schwenke/:
    On 02.08.2020 18:55, Stanimir Stamenkov wrote:

    I need to implement concurrent-safe UPSERT
    ...
    Traditional INSERT ... ON DUPLICATE KEY [1] doesn't work for me
    ...
    There's an index:

        CREATE INDEX ON table (col3, col4, col5)

    It is non-unique as there are legacy data that does not allow me to declare >> it unique.

    That is the point where I would solve it.

    1. add a new column `is_legacy_data`, i.e. TINYINT DEFAULT 0 that will be 0 for all new columns

    2. for legacy columns with (col3, col4, col5) being not unique, set the new column to a value that makes (col3, col4, col5, `is_legacy_data`) unique

    3. add the unique index on (col3, col4, col5, `is_legacy_data`)

    The current setup is similar, col5 is a new column which has a default
    value for applications which don't know about it. Can't think of an
    easy way to make it unique and recognize the data is from an older
    application this way. There will be a transitional period where old and
    new version applications will be pumping data into the table.

    At the end, I'll be normalizing the data post application upgrade to
    ensure all (col3, col4, col5) are unique and finally make the index
    unique, but wanted to have some insurance no unnecessary duplicates may
    appear in the meantime.

    4. be happy with INSERT ... ON DUPLICATE KEY ...

    Insert attempts which will end up updates are too many to waste sequence
    keys for me.

    --
    Stanimir

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stanimir Stamenkov@21:1/5 to All on Mon Aug 3 20:45:03 2020
    Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/ https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    Wouldn't this basically block concurrent inserts/updates? I don't want
    to block all on the same table trying to insert/update different sets of
    data.

    --
    Stanimir

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Natural Philosopher@21:1/5 to Stanimir Stamenkov on Mon Aug 3 19:09:45 2020
    On 03/08/2020 18:45, Stanimir Stamenkov wrote:
    Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/
    https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    Wouldn't this basically block concurrent inserts/updates?  I don't want
    to block all on the same table trying to insert/update different sets of data.

    it wont be for long...


    --
    “People believe certain stories because everyone important tells them,
    and people tell those stories because everyone important believes them.
    Indeed, when a conventional wisdom is at its fullest strength, one’s agreement with that conventional wisdom becomes almost a litmus test of
    one’s suitability to be taken seriously.”

    Paul Krugman

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Stanimir Stamenkov on Mon Aug 3 21:55:10 2020
    On 03/08/2020 19.45, Stanimir Stamenkov wrote:
    Sun, 2 Aug 2020 20:15:03 +0200, /J.O. Aho/:

    I would suggest you use

    LOCK TABLES table WRITE;

    <do what you did before>

    UNLOCK TABLES;


    LOCK TABLES table WRITE;

    UNLOCK TABLES;

    https://www.mysqltutorial.org/mysql-table-locking/
    https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

    Wouldn't this basically block concurrent inserts/updates?  I don't want
    to block all on the same table trying to insert/update different sets of data.


    Yes, it will, the good thing (or sometimes bad) is that the others will
    be waiting for the lock to be released and then do their task.

    Unless you do a lot in the transaction, the time the table is locked
    tends to be a lot. If the case is that you do a lot, then I would
    recommend you to break out things, so that the only thing you will do is
    the check for existing values and based on that do the update or insert.
    It could be good to make an insert SP that takes care of everything
    (begin transaction (if innodb),locking table, check if key already
    exists and do update else insert, unlock table, commit transaction (if innodb)).
    See https://dev.mysql.com/doc/refman/5.7/en/create-procedure.html

    Without the begin transaction/commit transaction, there is a risk for deadlocks, which means that the table never gets unlocked (could happen
    if the locking thread unexpectedly dies) and everyone else who wait for
    access the table will keep on waiting forever.

    Other alternative is to move the old data that can have more than one
    row with the same key to a history table and in the current table you
    make the key to a primary key. This will complicate things when you need
    to fetch data, you could of course be able to make a view that selects
    with a union from both tables, this will make it easier when you want to
    join things, but keep in mind it may slow things down.
    Alternative you have a switch telling you if you need current data or historical data, if you need the historical data, then use the history
    table otherwise the current table.
    As I don't know the needs, I can't give you a best option, but some
    options you can look at and then decide yourself which works for you.


    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)