Forum: >>> Magnum BBS <<<

Indexing with MT

From CV@21:1/5 to All on Thu Jul 15 16:56:53 2021

Hi everyone

I wish to know how a simple program that builds indexes for database files can be compiled/run using MT, so, given a computer with multiple cores run in parallel at least two or three indexing operations over different databases (I'm speaking about dbf+
fpt and cdx like dbfcdx).

That is, something like this:
---
wDataBases := {'DBF1', 'DBF2', 'DBF3'}
wIndexes := {'CDX1', 'CDX2', 'CDX3' }
wIndexKey := {'FLD1', 'FLD2', 'FLD3'}

for i := 1 to len(wDataBases)
select 0
use (wDataBases[i]) new exclusive
index on (wIndexKey[i]) to (wIndexes[i])
next
---
will build each index one after the other.

Then, my wish-list: building that same indexes in parallel, starting an individual thread for each database.

I don't know if this is possible, but if it is, it is worth a try.
Does someone made this?
A bit of code showing the procedure is welcome (and the libraries needed).

Best regards,
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dlzc@21:1/5 to All on Fri Jul 16 06:43:20 2021

Dear CV:

On Thursday, July 15, 2021 at 4:56:55 PM UTC-7, CV wrote:
...

Then, my wish-list: building that same indexes in parallel, starting
an individual thread for each database.

I don't know if this is possible, but if it is, it is worth a try.
Does someone made this?

Do you want one thread per index per database (will be slower, since the disk will be thrashing more), or just one thread per database (easier)?

Either sequence is a linear read of the database (potential time savings), but if they are not started at the "same time" the requested record might not still be in memory. And then sorting and production of the index will really start tearing things up
if you have a plattered drive.

David A. Smith

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ella Stern@21:1/5 to dlzc on Fri Jul 16 08:12:43 2021

On Friday, July 16, 2021 at 4:43:21 PM UTC+3, dlzc wrote:

Dear CV:

On Thursday, July 15, 2021 at 4:56:55 PM UTC-7, CV wrote:
...

Then, my wish-list: building that same indexes in parallel, starting
an individual thread for each database.

I don't know if this is possible, but if it is, it is worth a try.
Does someone made this?

Do you want one thread per index per database (will be slower, since the disk will be thrashing more), or just one thread per database (easier)?

Either sequence is a linear read of the database (potential time savings), but if they are not started at the "same time" the requested record might not still be in memory. And then sorting and production of the index will really start tearing things

up if you have a plattered drive.

David A. Smith

Table indexing is about using intensively as much as possible RAM memory in order to build aka tree structure, and saving the three to the hard disk.

When the index is too big to fit completely into the local RAM, the algorithm performs more cycles, and the users have the sensation that "the database has slowed down" (this applies both to indexing and read/write operations).

IMHO indexing more than a table at once would enforce the two programs to compete for the available RAM, and each would slow down the other.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Fri Jul 16 14:37:02 2021

El viernes, 16 de julio de 2021 a la(s) 12:12:44 UTC-3, Ella Stern escribió:

On Friday, July 16, 2021 at 4:43:21 PM UTC+3, dlzc wrote:

Dear CV:

On Thursday, July 15, 2021 at 4:56:55 PM UTC-7, CV wrote:
...

Then, my wish-list: building that same indexes in parallel, starting
an individual thread for each database.

I don't know if this is possible, but if it is, it is worth a try.
Does someone made this?

Do you want one thread per index per database (will be slower, since the disk will be thrashing more), or just one thread per database (easier)?

Either sequence is a linear read of the database (potential time savings), but if they are not started at the "same time" the requested record might not still be in memory. And then sorting and production of the index will really start tearing things

up if you have a plattered drive.

David A. Smith

Table indexing is about using intensively as much as possible RAM memory in order to build aka tree structure, and saving the three to the hard disk.

When the index is too big to fit completely into the local RAM, the algorithm performs more cycles, and the users have the sensation that "the database has slowed down" (this applies both to indexing and read/write operations).

IMHO indexing more than a table at once would enforce the two programs to compete for the available RAM, and each would slow down the other.

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them are
structural := cdx and dbf with same name).

Regards,
©

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ella Stern@21:1/5 to All on Sat Jul 17 01:40:40 2021

Suggestion: explain to the admin that
- the indexing process needs a dedicated physical machine (NOT a VM) with a minimal Windows OS image (32 bits version) processor with many L1/L2 memory and high pace, 3 GB RAM, no Internet access, no end-user access, and your executable, which is doing
ONLY the indexing, and nothing else
- before starting your executable, the necessary .DBF tables are copied onto that machine
- after the indexing completes successfully, the tables and indexes are picked up from that machine

Database server engines like Oracle and MS SQL are running on dedicated server with no connection to end-users, and each version is tied to specific OS versions and hardware, because they have some multi-threading features, which require advanced RAM and
IO management.

Python and NodeJS are receiving user requests on different threads, but all those threads are using the RAM via time-sharing (one by one).

As I've mentioned, in case of indexing the critical resource is the RAM.

HTH

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them are

structural := cdx and dbf with same name).

Regards,
©

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Sat Jul 17 06:50:56 2021

El sábado, 17 de julio de 2021 a la(s) 05:40:42 UTC-3, Ella Stern escribió:

Suggestion: explain to the admin that
- the indexing process needs a dedicated physical machine (NOT a VM) with a minimal Windows OS image (32 bits version) processor with many L1/L2 memory and high pace, 3 GB RAM, no Internet access, no end-user access, and your executable, which is doing

ONLY the indexing, and nothing else

- before starting your executable, the necessary .DBF tables are copied onto that machine
- after the indexing completes successfully, the tables and indexes are picked up from that machine

Database server engines like Oracle and MS SQL are running on dedicated server with no connection to end-users, and each version is tied to specific OS versions and hardware, because they have some multi-threading features, which require advanced RAM

and IO management.

Python and NodeJS are receiving user requests on different threads, but all those threads are using the RAM via time-sharing (one by one).

As I've mentioned, in case of indexing the critical resource is the RAM.

HTH

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them

are structural := cdx and dbf with same name).

Regards,
©

Ella

Thank you for your explanation.

No VM machines on that server, plenty of ram, enough speed in disk access... almost no limits in hardware.
The only limit is the available time frame to rebuild indexes in case it is needed.

How about a piece of code to do what I need to implement (or test)?
Is xharbour able to do that without errors?

Regards
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Daniele@21:1/5 to All on Sat Jul 17 22:29:49 2021

Il 17/07/2021 15:50, CV ha scritto:

El sábado, 17 de julio de 2021 a la(s) 05:40:42 UTC-3, Ella Stern escribió:

Suggestion: explain to the admin that
- the indexing process needs a dedicated physical machine (NOT a VM) with a minimal Windows OS image (32 bits version) processor with many L1/L2 memory and high pace, 3 GB RAM, no Internet access, no end-user access, and your executable, which is

doing ONLY the indexing, and nothing else

- before starting your executable, the necessary .DBF tables are copied onto that machine
- after the indexing completes successfully, the tables and indexes are picked up from that machine

Database server engines like Oracle and MS SQL are running on dedicated server with no connection to end-users, and each version is tied to specific OS versions and hardware, because they have some multi-threading features, which require advanced RAM

and IO management.

Python and NodeJS are receiving user requests on different threads, but all those threads are using the RAM via time-sharing (one by one).

As I've mentioned, in case of indexing the critical resource is the RAM.

HTH

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them

are structural := cdx and dbf with same name).

Regards,
©

Ella

Thank you for your explanation.

No VM machines on that server, plenty of ram, enough speed in disk access... almost no limits in hardware.
The only limit is the available time frame to rebuild indexes in case it is needed.

How about a piece of code to do what I need to implement (or test)?
Is xharbour able to do that without errors?

Regards
Claudio Voskian

Try it yourself adapting this pseudocode:

#ifdef __XHARBOUR__
#xtranslate hb_threadStart( <x,...> ) => StartThread( <x> )
#endif
#include "hbthread.ch"

procedure main
...
// test monothread
start_time:=seconds()
? "start single thread "+ time()
index1()
index2()
elap_time=seconds()
? "End:"+time()+" seconds:"+dctrim(elap_time-start_time)

? "Start multithread:"+ time()

? "Start thread 1:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC , @index1() )
? "Start thread 2:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC, @index2() )

wait ""
return

func index1()
local start_time:=seconds(),elap_time
ferase index
use ... exclusive
index on...
use
elap_time=seconds()
? "End thread 1:"+time()+" seconds:"+dctrim(elap_time-start_time)
return nil

func index2()
...the same
return nil

Let us know
Dan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Sat Jul 17 19:39:10 2021

El sábado, 17 de julio de 2021 a la(s) 17:29:53 UTC-3, Daniele escribió:

Il 17/07/2021 15:50, CV ha scritto:

El sábado, 17 de julio de 2021 a la(s) 05:40:42 UTC-3, Ella Stern escribió:

Suggestion: explain to the admin that
- the indexing process needs a dedicated physical machine (NOT a VM) with a minimal Windows OS image (32 bits version) processor with many L1/L2 memory and high pace, 3 GB RAM, no Internet access, no end-user access, and your executable, which is

doing ONLY the indexing, and nothing else

- before starting your executable, the necessary .DBF tables are copied onto that machine
- after the indexing completes successfully, the tables and indexes are picked up from that machine

Database server engines like Oracle and MS SQL are running on dedicated server with no connection to end-users, and each version is tied to specific OS versions and hardware, because they have some multi-threading features, which require advanced

RAM and IO management.

Python and NodeJS are receiving user requests on different threads, but all those threads are using the RAM via time-sharing (one by one).

As I've mentioned, in case of indexing the critical resource is the RAM. >>
HTH

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them

are structural := cdx and dbf with same name).

Regards,
©

Ella

Thank you for your explanation.

No VM machines on that server, plenty of ram, enough speed in disk access... almost no limits in hardware.
The only limit is the available time frame to rebuild indexes in case it is needed.

How about a piece of code to do what I need to implement (or test)?
Is xharbour able to do that without errors?

Regards
Claudio Voskian

Try it yourself adapting this pseudocode:

#ifdef __XHARBOUR__
#xtranslate hb_threadStart( <x,...> ) => StartThread( <x> )
#endif
#include "hbthread.ch"

procedure main
...
// test monothread
start_time:=seconds()
? "start single thread "+ time()
index1()
index2()
elap_time=seconds()
? "End:"+time()+" seconds:"+dctrim(elap_time-start_time)

? "Start multithread:"+ time()

? "Start thread 1:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC , @index1() )
? "Start thread 2:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC, @index2() )

wait ""
return

func index1()
local start_time:=seconds(),elap_time
ferase index
use ... exclusive
index on...
use
elap_time=seconds()
? "End thread 1:"+time()+" seconds:"+dctrim(elap_time-start_time)
return nil

func index2()
...the same
return nil

Let us know
Dan

Hi Dan!

Thank you for the code, but: how is it supposed to open "exclusively" the same file in two threads?
That is, this instruction should be common to both threads, but executed only once before starting anything else:

use ... exclusive

Anyway I will try to adapt it to my needs.

Regards,
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Sun Jul 18 18:39:26 2021

El sábado, 17 de julio de 2021 a la(s) 17:29:53 UTC-3, Daniele escribió:

Il 17/07/2021 15:50, CV ha scritto:

El sábado, 17 de julio de 2021 a la(s) 05:40:42 UTC-3, Ella Stern escribió:

Suggestion: explain to the admin that
- the indexing process needs a dedicated physical machine (NOT a VM) with a minimal Windows OS image (32 bits version) processor with many L1/L2 memory and high pace, 3 GB RAM, no Internet access, no end-user access, and your executable, which is

doing ONLY the indexing, and nothing else

- before starting your executable, the necessary .DBF tables are copied onto that machine
- after the indexing completes successfully, the tables and indexes are picked up from that machine

Database server engines like Oracle and MS SQL are running on dedicated server with no connection to end-users, and each version is tied to specific OS versions and hardware, because they have some multi-threading features, which require advanced

RAM and IO management.

Python and NodeJS are receiving user requests on different threads, but all those threads are using the RAM via time-sharing (one by one).

As I've mentioned, in case of indexing the critical resource is the RAM. >>
HTH

Ella, David

Thank you for your answers.

The computer acting as server runs windows server, is a really fast one (12 o 16 cores), the disk is a solid state drive and has 32 gb memory, so there are almost no hardware limits.
Thinking about the xharbour application is a 32 bit one, which uses no more than 50 Mb ram in the worst case so far.

The indexing process will run at night; this application is used 24 hs a day (with remote users at home, async data downloads, etc.), I don't want to run out of time when running this automated process, that is the reason of my message/need.

Should be one thread per database; as the database is used "exclusive" for indexing (I'm sure no other process will interfere), one index after the other, but many processes running in parallel recreating a set of indexes (many orders - all of them

are structural := cdx and dbf with same name).

Regards,
©

Ella

Thank you for your explanation.

No VM machines on that server, plenty of ram, enough speed in disk access... almost no limits in hardware.
The only limit is the available time frame to rebuild indexes in case it is needed.

How about a piece of code to do what I need to implement (or test)?
Is xharbour able to do that without errors?

Regards
Claudio Voskian

Try it yourself adapting this pseudocode:

#ifdef __XHARBOUR__
#xtranslate hb_threadStart( <x,...> ) => StartThread( <x> )
#endif
#include "hbthread.ch"

procedure main
...
// test monothread
start_time:=seconds()
? "start single thread "+ time()
index1()
index2()
elap_time=seconds()
? "End:"+time()+" seconds:"+dctrim(elap_time-start_time)

? "Start multithread:"+ time()

? "Start thread 1:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC , @index1() )
? "Start thread 2:"+ time()
hb_threadStart( HB_THREAD_INHERIT_PUBLIC, @index2() )

wait ""
return

func index1()
local start_time:=seconds(),elap_time
ferase index
use ... exclusive
index on...
use
elap_time=seconds()
? "End thread 1:"+time()+" seconds:"+dctrim(elap_time-start_time)
return nil

func index2()
...the same
return nil

Let us know
Dan

Hi everyone

After some testings and adapting the code to xharbour, I just receive an error in a windows message box:
"hb_xrealloc can't reallocate memory."

With this text written in the console:
Unrecoverable error 9009: Unrecoverable error 9011: hb_xrealloc can't reallocate memoryhb_xfree called with a NULL pointer Called from
INDEX1(49)Called from INDEX2(0)
Called from ORDCREATE(0)Called from ORDCREATE(0)
Called from INDEX1(49)Called from INDEX2(76)

The code:
*
request dbfcdx
proc main()
? "Start thread 1:", seconds()
StartThread(@index1())
return
*
function index1()
local start_time:=seconds(),elap_time

ferase ('his_jude.cdx')
ferase ('his_jude1.cdx')

use his_jude exclusive new via 'dbfcdx'
index on STR(NRO_DEUDOR)+DTOS(FECHA) tag 'GJE_DEUDOR' to 'HIS_JUDE'
index on STR(NRO_DEUDOR)+EST_CARTA+DTOS(FECHA)+LEFT(TELEFONO,13) tag 'GJE_DEUETA' to 'HIS_JUDE' additive
index on FECHA tag 'GJE_FECHA' to 'HIS_JUDE1'
index on ARCHIVOCAM+str(NRO_DEUDOR)+left(TELEFONO,13) tag 'GJE_ARCHIV' to 'HIS_JUDE1' additive
index on ID_BML tag 'GJE_IDBML' to 'HIS_JUDE1' additive unique
use
elap_time := seconds()
? "End thread 1:"+time()+" seconds:"+str(elap_time-start_time)
return nil

And no index created.

If it is so difficult to build a simple index, I can't use MT in any other process.
Don't want to have a headache, so my simple solution: starting a couple of external programs from inside the application to pack and build indexes using shellexecute().

Regards
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Daniele@21:1/5 to All on Mon Jul 19 14:46:22 2021

Il 19/07/2021 03:39, CV ha scritto:

StartThread(@index1())
return

So you start the thread and then exit. How can it work?

StartThread(@index1())
wait "Press a key"
return

Anyway, such a code is just for testing, eh.
Dan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Mon Jul 19 09:37:46 2021

El lunes, 19 de julio de 2021 a la(s) 09:46:27 UTC-3, Daniele escribió:

Il 19/07/2021 03:39, CV ha scritto:

StartThread(@index1())
return

So you start the thread and then exit. How can it work?

StartThread(@index1())
wait "Press a key"
return

Anyway, such a code is just for testing, eh.
Dan

Dan

It was a copy and paste with missing lines, I have the wait "" before the end of the main routine, and there are 2 indexing functions for different databases (while I just copied one for the example, the other is identical).

When I start the 2 threads *sometimes* the error message occurs.
Other times just does nothing at all, I have to close the application with [X] upper right control.

Regards
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Wed Jul 21 06:05:12 2021

El lunes, 19 de julio de 2021 a la(s) 13:37:47 UTC-3, CV escribió:

El lunes, 19 de julio de 2021 a la(s) 09:46:27 UTC-3, Daniele escribió:

Il 19/07/2021 03:39, CV ha scritto:

StartThread(@index1())
return

So you start the thread and then exit. How can it work?

StartThread(@index1())
wait "Press a key"
return

Anyway, such a code is just for testing, eh.
Dan

Dan

It was a copy and paste with missing lines, I have the wait "" before the end of the main routine, and there are 2 indexing functions for different databases (while I just copied one for the example, the other is identical).

When I start the 2 threads *sometimes* the error message occurs.
Other times just does nothing at all, I have to close the application with [X] upper right control.

Regards
Claudio Voskian

Hi everyone, Dan specially

I don't know why, but the very same program that previously DOESN'T work, now works properly.
I didn't change a line, tried to test it yesterday and ... WORKS.
A mistery.

Thank you for the code!

Regards
Claudio Voskian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Daniele@21:1/5 to All on Thu Jul 22 21:57:13 2021

Il 21/07/2021 15:05, CV ha scritto:

When I start the 2 threads *sometimes* the error message occurs.
Other times just does nothing at all, I have to close the application with [X] upper right control.

Regards
Claudio Voskian

Hi everyone, Dan specially

I don't know why, but the very same program that previously DOESN'T work, now works properly.
I didn't change a line, tried to test it yesterday and ... WORKS.
A mistery.

Well, I did not believe the two threads indexing the same file would
have succeeded. I was thinking of 2 different files!
I learned something. :-)

Thank you for the code!

Regards
Claudio Voskian

You are welcome.
Dan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From CV@21:1/5 to All on Thu Jul 22 15:40:53 2021

El jueves, 22 de julio de 2021 a la(s) 16:57:21 UTC-3, Daniele escribió:

Il 21/07/2021 15:05, CV ha scritto:

When I start the 2 threads *sometimes* the error message occurs.
Other times just does nothing at all, I have to close the application with [X] upper right control.

Regards
Claudio Voskian

Hi everyone, Dan specially

I don't know why, but the very same program that previously DOESN'T work, now works properly.
I didn't change a line, tried to test it yesterday and ... WORKS.
A mistery.

Well, I did not believe the two threads indexing the same file would
have succeeded. I was thinking of 2 different files!
I learned something. :-)

Thank you for the code!

Regards
Claudio Voskian

You are welcome.
Dan

Dan
I used two different threads using TWO different files and indexes.
Doing the same process over the same file at the same time is non-sense.

Regards
©

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	46:01:24
Calls:	6,648
Files:	12,198
Messages:	5,329,850

Indexing with MT

Who's Online

System Info