• cluster or hash table advise needed

    From Erland Sommarskog@21:1/5 to M.G. on Tue Sep 29 22:15:16 2015
    M.G. (michael@gurfinkel.us) writes:
    We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

    Important question: is the activity concurrent or not? That is, can data for multiple experiments be written simultaneously?

    Expected number of records - around 10 million, expected number of
    inserts (pure additions) around 5000 a day, number of change
    (delete/insert) around 1000

    OK, so that is not really high activity. :-)

    And then I would assume that there is no concurrent activity.

    I would say that you scrap the ACTIVITY_SEQUENCE column, as it seems
    to serve no purpose. Make (EXP_ID, ACT_SEQ, ACT_ID) a primary key, because
    a having a primary key based on data is a good thing. You may never
    look at indivitual data, but you don't want duplicates. And make
    this PK clustered, that seems to fit your use of the table very well.




    --
    Erland Sommarskog, Stockholm, esquel@sommarskog.se

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M.G.@21:1/5 to Erland Sommarskog on Tue Sep 29 13:55:59 2015
    On Tuesday, September 29, 2015 at 1:15:20 PM UTC-7, Erland Sommarskog wrote:
    M.G. (michael@gurfinkel.us) writes:
    We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

    Important question: is the activity concurrent or not? That is, can data for multiple experiments be written simultaneously?

    Expected number of records - around 10 million, expected number of
    inserts (pure additions) around 5000 a day, number of change (delete/insert) around 1000

    your concurrency assumption is correct - not an issue here
    thank you for the suggestion.
    OK, so that is not really high activity. :-)

    And then I would assume that there is no concurrent activity.

    I would say that you scrap the ACTIVITY_SEQUENCE column, as it seems
    to serve no purpose. Make (EXP_ID, ACT_SEQ, ACT_ID) a primary key, because
    a having a primary key based on data is a good thing. You may never
    look at indivitual data, but you don't want duplicates. And make
    this PK clustered, that seems to fit your use of the table very well.




    --
    Erland Sommarskog, Stockholm, esquel@sommarskog.se

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M.G.@21:1/5 to All on Tue Sep 29 12:06:05 2015
    We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

    CREATE TABLE ACTION_SEQUENCE (
    /* ACTION_SEQUENCE_ID int NOT NULL, <<< questionable */
    EXP_ID int NOT NULL,
    ACT_SEQ int NOT NULL,
    ACT_ID int NOT NULL,
    MODIFIED_TIME datetime NULL,
    ACT_TYPE int NOT NULL,
    );

    EXP_ID and ACT_ID are foreign keys into experiments and actions tables correspondingly.

    sample data for two experiments with 2 action for #100 and three actions for #200:
    EXP_ID ACT_SEQ ACT_ID ACT_TYPE
    100 1 233 0
    100 2 560 0
    100 3 233 1
    200 1 220 0
    200 2 220 1
    200 3 778 0
    200 4 778 1

    The nature of EXP_ID - monotonous increment, same for ACT_ID.

    How we read data - access one experiment at a time, its actions sorted by ACT_SEQ. Like this:
    select * from ACTION_SEQUENCE where EXP_ID=@ID order by ACT_SEQ;
    The sequence is of essence here.

    How we add / update data - use delete/insert approach (never update), again records per one experiment are always deleted (if exists) and inserted as a group, like this:
    delete from ACTION_SEQUENCE where EXP_ID=@ID;
    insert into ACTION_SEQUENCE(EXP_ID,ACT_SEQ,ACT_ID) values (@ID,...)

    Expected number of records - around 10 million, expected number of inserts (pure additions) around 5000 a day, number of change (delete/insert) around 1000

    Expected read / write ratio - 10 reads per 1 update

    The question - would you make it a clustered table. or no cluster at all? Current design - table has ACTION_SEQUENCE_ID primary key (cluster), the sequence itself is maintained through external sequence table (reason - the app development framework).
    I don't see any need for cluster here (and it's maintenance).
    We can use EXP_ID, ACT_SEQ, ACT_ID as unique key logically, but again, we never deal with individual records, only with groups of records for a given experiment.

    Your input is highly appreciated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)