Forum: >>> Magnum BBS <<<

cluster or hash table advise needed

From Erland Sommarskog@21:1/5 to M.G. on Tue Sep 29 22:15:16 2015

M.G. (michael@gurfinkel.us) writes:

We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

Important question: is the activity concurrent or not? That is, can data for multiple experiments be written simultaneously?

Expected number of records - around 10 million, expected number of
inserts (pure additions) around 5000 a day, number of change
(delete/insert) around 1000

OK, so that is not really high activity. :-)

And then I would assume that there is no concurrent activity.

I would say that you scrap the ACTIVITY_SEQUENCE column, as it seems
to serve no purpose. Make (EXP_ID, ACT_SEQ, ACT_ID) a primary key, because
a having a primary key based on data is a good thing. You may never
look at indivitual data, but you don't want duplicates. And make
this PK clustered, that seems to fit your use of the table very well.

--
Erland Sommarskog, Stockholm, esquel@sommarskog.se

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From M.G.@21:1/5 to Erland Sommarskog on Tue Sep 29 13:55:59 2015

On Tuesday, September 29, 2015 at 1:15:20 PM UTC-7, Erland Sommarskog wrote:

M.G. (michael@gurfinkel.us) writes:

We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

Important question: is the activity concurrent or not? That is, can data for multiple experiments be written simultaneously?

Expected number of records - around 10 million, expected number of
inserts (pure additions) around 5000 a day, number of change (delete/insert) around 1000

your concurrency assumption is correct - not an issue here
thank you for the suggestion.

OK, so that is not really high activity. :-)

And then I would assume that there is no concurrent activity.

I would say that you scrap the ACTIVITY_SEQUENCE column, as it seems
to serve no purpose. Make (EXP_ID, ACT_SEQ, ACT_ID) a primary key, because
a having a primary key based on data is a good thing. You may never
look at indivitual data, but you don't want duplicates. And make
this PK clustered, that seems to fit your use of the table very well.

--
Erland Sommarskog, Stockholm, esquel@sommarskog.se

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From M.G.@21:1/5 to All on Tue Sep 29 12:06:05 2015

We are designing a table with high insert / delete activity. The table maintains sequence of actions per specific experiments. These are the attributes:

CREATE TABLE ACTION_SEQUENCE (
/* ACTION_SEQUENCE_ID int NOT NULL, <<< questionable */
EXP_ID int NOT NULL,
ACT_SEQ int NOT NULL,
ACT_ID int NOT NULL,
MODIFIED_TIME datetime NULL,
ACT_TYPE int NOT NULL,
);

EXP_ID and ACT_ID are foreign keys into experiments and actions tables correspondingly.

sample data for two experiments with 2 action for #100 and three actions for #200:
EXP_ID ACT_SEQ ACT_ID ACT_TYPE
100 1 233 0
100 2 560 0
100 3 233 1
200 1 220 0
200 2 220 1
200 3 778 0
200 4 778 1

The nature of EXP_ID - monotonous increment, same for ACT_ID.

How we read data - access one experiment at a time, its actions sorted by ACT_SEQ. Like this:
select * from ACTION_SEQUENCE where EXP_ID=@ID order by ACT_SEQ;
The sequence is of essence here.

How we add / update data - use delete/insert approach (never update), again records per one experiment are always deleted (if exists) and inserted as a group, like this:
delete from ACTION_SEQUENCE where EXP_ID=@ID;
insert into ACTION_SEQUENCE(EXP_ID,ACT_SEQ,ACT_ID) values (@ID,...)

Expected number of records - around 10 million, expected number of inserts (pure additions) around 5000 a day, number of change (delete/insert) around 1000

Expected read / write ratio - 10 reads per 1 update

The question - would you make it a clustered table. or no cluster at all? Current design - table has ACTION_SEQUENCE_ID primary key (cluster), the sequence itself is maintained through external sequence table (reason - the app development framework).
I don't see any need for cluster here (and it's maintenance).
We can use EXP_ID, ACT_SEQ, ACT_ID as unique key logically, but again, we never deal with individual records, only with groups of records for a given experiment.

Your input is highly appreciated.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	49:13:15
Calls:	6,648
Files:	12,200
Messages:	5,330,097

cluster or hash table advise needed

Who's Online

System Info