Forum: >>> Magnum BBS <<<

sql cost changing from day to day

From Memmedaga Memmedov@21:1/5 to All on Thu Sep 29 23:28:51 2022

Hi,
You can understand my problem from subject; "sql cost changing from day to day" I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
TABLEID has index ID and DATE.

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

------------------------------------------------------------------------------
Plan step 1
------------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
SBB for reads : Virtual, double buffer
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0035% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0035% of rows from table
Expected row count: 335 rows after the scan
Operation cost : 1030

Total cost : 1030

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

-----------------------------------------------------------------------------
Plan step 1
-----------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
SBB for reads : Real
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0004% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0004% of rows from table
Expected row count: 37 rows after the scan
Operation cost : 117

Total cost : 117

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Randall@21:1/5 to Memmedaga Memmedov on Fri Sep 30 12:38:53 2022

On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:

Hi,
You can understand my problem from subject; "sql cost changing from day to day"
I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
TABLEID has index ID and DATE.

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

------------------------------------------------------------------------------
Plan step 1 ------------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
SBB for reads : Virtual, double buffer
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0035% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0035% of rows from table
Expected row count: 335 rows after the scan
Operation cost : 1030

Total cost : 1030

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

----------------------------------------------------------------------------- Plan step 1 -----------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
SBB for reads : Real
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0004% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0004% of rows from table
Expected row count: 37 rows after the scan
Operation cost : 117

Total cost : 117

How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Dick@21:1/5 to Randall on Fri Sep 30 22:37:19 2022

On Friday, September 30, 2022 at 12:38:54 PM UTC-7, Randall wrote:

On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:

Hi,
You can understand my problem from subject; "sql cost changing from day to day"
I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
TABLEID has index ID and DATE.

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

------------------------------------------------------------------------------
Plan step 1 ------------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
SBB for reads : Virtual, double buffer
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0035% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0035% of rows from table
Expected row count: 335 rows after the scan
Operation cost : 1030

Total cost : 1030

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

-----------------------------------------------------------------------------
Plan step 1 -----------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
SBB for reads : Real
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0004% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0004% of rows from table
Expected row count: 37 rows after the scan
Operation cost : 117

Total cost : 117

How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?

I assume that when you wrote "TABLEID has 2 primary key, ID and CUSTOMER", you meant that the primary key contains two columns named ID and CUSTOMER. Is that correct? A table cannot have two primary keys., so I don't know what else you could have
meant.

Are the two runs actually accessing the same table? Or is the DEFINE =TABLEID pointing to different tables for the two runs?

Can you tell whether the number of rows accessed from the base table actually are approximately the numbers estimated by the EXPLAIN output (about 335 in one case and about 37 in the other)?

Even if the number of rows accessed is about 335 vs. about 37, I would not expect either program to take enough time to run that you would feel the need to say that one of the runs is slow, unless the program performs an awful lot of work on each row
that it accesses. What is the actual run time of the program in the two cases?

I imagine that the value of ID that the two queries are selecting is different in the two jobs, and so it is correct that the number of rows that match the queries are different. Is that true, or are you getting a different number of rows actually
selected when you run the same query twice?

I never needed to use EXPLAIN plans very much, so I don't remember very much about interpreting them. But I have a feeling that these plans say that it is using the alternate index on ID rather than the initial part of the primary key (ID, CUSTOMER) in
the scan. If the primary key actually is on (CUSTOMER, ID), then that would make sense, since the rows of the base table would not be grouped by ID. That would mean it would scan the index on ID, and for each row of the index that matched the value of
ID, it would have to do a random read of the base table, jumping all around the base table rather than being able to sequentially read a subset of rows that are all together from the base table. That random access will make the program run slower than
if the base table had ID as the most significant part of the primary key, but even so, it would still only be around 300 random reads from the base table, and that would not take very much longer than 30 or 40 random reads of the base table.

If I had to make a guess based on the incomplete information given here, I would guess that the job that runs enough slower to warrant raising this question is actually accessing a lot more than 300 rows from the base table and has to access them
randomly because the primary key does not have ID as its most significant column. The job that runs faster is still doing the random reads, but is accessing a much smaller number of rows. That guess might be wrong, so keep in mind that it is only my
guess.

That does not answer the question of why SQL computes a significantly different cost for the query in the two jobs. If my guess that the two jobs are not accessing the same table is not the answer for that part of puzzle, then I'd have to agree with
Randall that the difference could have something to do with the statistics on the table, but I don't know enough about the optimizer to guess what is most likely to have gone wrong with the statistics.

If the primary key of the table actually is (ID, CUSTOMER), then I think the plan should not be using the alternate index at all, and the reason it is using the alternate index would need to be addressed. Maybe something gone wrong with the statistics
could have caused that, but I don't know enough to say what it could be.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Randall@21:1/5 to rkd...@gmail.com on Sat Oct 1 12:20:05 2022

On Saturday, October 1, 2022 at 1:37:21 a.m. UTC-4, rkd...@gmail.com wrote:

On Friday, September 30, 2022 at 12:38:54 PM UTC-7, Randall wrote:

On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:

Hi,
You can understand my problem from subject; "sql cost changing from day to day"
I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
I will share two outputs of explain comments. What is the reason? Thanks.
TABLEID has 2 primary key, ID and CUSTOMER.
TABLEID has index ID and DATE.

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

------------------------------------------------------------------------------
Plan step 1 ------------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache SBB for reads : Virtual, double buffer
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0035% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0035% of rows from table
Expected row count: 335 rows after the scan
Operation cost : 1030

Total cost : 1030

explain

select * from =TABLEID
where ID=12345678912
browse access;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Query plan 1
SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

-----------------------------------------------------------------------------
Plan step 1 -----------------------------------------------------------------------------

Operation 1.0 : Scan
Table : \SYS1.$DISK1.SUBVOL.TABLEID
with correlation name TABLEID
Access type : No locks, browse access
Lock mode : Chosen by the system
Column processing : Requires retrieval of 50 out of 50 columns

Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
SBB for reads : Real
Begin key pred. : ID = 12345678912
End key pred. : ID = 12345678912
Index selectivity : Expect to examine 0.0004% of rows from index
Index pred. : None
Base table pred. : None

Executor pred. : None
Table selectivity : Expect to select 0.0004% of rows from table
Expected row count: 37 rows after the scan
Operation cost : 117

Total cost : 117

How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?

I assume that when you wrote "TABLEID has 2 primary key, ID and CUSTOMER", you meant that the primary key contains two columns named ID and CUSTOMER. Is that correct? A table cannot have two primary keys., so I don't know what else you could have meant.

Are the two runs actually accessing the same table? Or is the DEFINE =TABLEID pointing to different tables for the two runs?

Can you tell whether the number of rows accessed from the base table actually are approximately the numbers estimated by the EXPLAIN output (about 335 in one case and about 37 in the other)?

Even if the number of rows accessed is about 335 vs. about 37, I would not expect either program to take enough time to run that you would feel the need to say that one of the runs is slow, unless the program performs an awful lot of work on each row

that it accesses. What is the actual run time of the program in the two cases?

I imagine that the value of ID that the two queries are selecting is different in the two jobs, and so it is correct that the number of rows that match the queries are different. Is that true, or are you getting a different number of rows actually

selected when you run the same query twice?

I never needed to use EXPLAIN plans very much, so I don't remember very much about interpreting them. But I have a feeling that these plans say that it is using the alternate index on ID rather than the initial part of the primary key (ID, CUSTOMER) in

the scan. If the primary key actually is on (CUSTOMER, ID), then that would make sense, since the rows of the base table would not be grouped by ID. That would mean it would scan the index on ID, and for each row of the index that matched the value of ID,
it would have to do a random read of the base table, jumping all around the base table rather than being able to sequentially read a subset of rows that are all together from the base table. That random access will make the program run slower than if
the base table had ID as the most significant part of the primary key, but even so, it would still only be around 300 random reads from the base table, and that would not take very much longer than 30 or 40 random reads of the base table.

If I had to make a guess based on the incomplete information given here, I would guess that the job that runs enough slower to warrant raising this question is actually accessing a lot more than 300 rows from the base table and has to access them

randomly because the primary key does not have ID as its most significant column. The job that runs faster is still doing the random reads, but is accessing a much smaller number of rows. That guess might be wrong, so keep in mind that it is only my
guess.

That does not answer the question of why SQL computes a significantly different cost for the query in the two jobs. If my guess that the two jobs are not accessing the same table is not the answer for that part of puzzle, then I'd have to agree with

Randall that the difference could have something to do with the statistics on the table, but I don't know enough about the optimizer to guess what is most likely to have gone wrong with the statistics.

If the primary key of the table actually is (ID, CUSTOMER), then I think the plan should not be using the alternate index at all, and the reason it is using the alternate index would need to be addressed. Maybe something gone wrong with the statistics

could have caused that, but I don't know enough to say what it could be.

My supposition is that the optimizer is picking the alternate index because the keys are fitting into a convenient set of blocks and would be faster to process than reading by primary key. That's partly where unspecified statistics can both mess up plans
and provide misleading and inaccurate costs. The only true measure of performance is actually running the query with MEASURE monitoring the I/Os and SQL operations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Mon Apr 29 19:16:32 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Apr 29 09:04:47 2024
  from Wales, Uk via Telnet
- Smithy
  Tue Apr 30 00:43:37 2024
  from Plymouth via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (0 / 16)
Uptime:	01:49:46
Calls:	6,669
Calls today:	1
Files:	12,217
Messages:	5,338,562

sql cost changing from day to day

Who's Online

Recent Visitors

System Info