• sql cost changing from day to day

    From Memmedaga Memmedov@21:1/5 to All on Thu Sep 29 23:28:51 2022
    Hi,
    You can understand my problem from subject; "sql cost changing from day to day" I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
    I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
    TABLEID has index ID and DATE.

    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    ------------------------------------------------------------------------------
    Plan step 1
    ------------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
    SBB for reads : Virtual, double buffer
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0035% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0035% of rows from table
    Expected row count: 335 rows after the scan
    Operation cost : 1030

    Total cost : 1030


    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    -----------------------------------------------------------------------------
    Plan step 1
    -----------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
    SBB for reads : Real
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0004% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0004% of rows from table
    Expected row count: 37 rows after the scan
    Operation cost : 117

    Total cost : 117


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Randall@21:1/5 to Memmedaga Memmedov on Fri Sep 30 12:38:53 2022
    On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:
    Hi,
    You can understand my problem from subject; "sql cost changing from day to day"
    I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
    I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
    TABLEID has index ID and DATE.

    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    ------------------------------------------------------------------------------
    Plan step 1 ------------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
    SBB for reads : Virtual, double buffer
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0035% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0035% of rows from table
    Expected row count: 335 rows after the scan
    Operation cost : 1030

    Total cost : 1030


    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    ----------------------------------------------------------------------------- Plan step 1 -----------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
    SBB for reads : Real
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0004% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0004% of rows from table
    Expected row count: 37 rows after the scan
    Operation cost : 117

    Total cost : 117


    How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Dick@21:1/5 to Randall on Fri Sep 30 22:37:19 2022
    On Friday, September 30, 2022 at 12:38:54 PM UTC-7, Randall wrote:
    On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:
    Hi,
    You can understand my problem from subject; "sql cost changing from day to day"
    I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
    I will share two outputs of explain comments. What is the reason? Thanks. TABLEID has 2 primary key, ID and CUSTOMER.
    TABLEID has index ID and DATE.

    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    ------------------------------------------------------------------------------
    Plan step 1 ------------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache
    SBB for reads : Virtual, double buffer
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0035% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0035% of rows from table
    Expected row count: 335 rows after the scan
    Operation cost : 1030

    Total cost : 1030


    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    -----------------------------------------------------------------------------
    Plan step 1 -----------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
    SBB for reads : Real
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0004% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0004% of rows from table
    Expected row count: 37 rows after the scan
    Operation cost : 117

    Total cost : 117

    How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?

    I assume that when you wrote "TABLEID has 2 primary key, ID and CUSTOMER", you meant that the primary key contains two columns named ID and CUSTOMER. Is that correct? A table cannot have two primary keys., so I don't know what else you could have
    meant.

    Are the two runs actually accessing the same table? Or is the DEFINE =TABLEID pointing to different tables for the two runs?

    Can you tell whether the number of rows accessed from the base table actually are approximately the numbers estimated by the EXPLAIN output (about 335 in one case and about 37 in the other)?

    Even if the number of rows accessed is about 335 vs. about 37, I would not expect either program to take enough time to run that you would feel the need to say that one of the runs is slow, unless the program performs an awful lot of work on each row
    that it accesses. What is the actual run time of the program in the two cases?

    I imagine that the value of ID that the two queries are selecting is different in the two jobs, and so it is correct that the number of rows that match the queries are different. Is that true, or are you getting a different number of rows actually
    selected when you run the same query twice?

    I never needed to use EXPLAIN plans very much, so I don't remember very much about interpreting them. But I have a feeling that these plans say that it is using the alternate index on ID rather than the initial part of the primary key (ID, CUSTOMER) in
    the scan. If the primary key actually is on (CUSTOMER, ID), then that would make sense, since the rows of the base table would not be grouped by ID. That would mean it would scan the index on ID, and for each row of the index that matched the value of
    ID, it would have to do a random read of the base table, jumping all around the base table rather than being able to sequentially read a subset of rows that are all together from the base table. That random access will make the program run slower than
    if the base table had ID as the most significant part of the primary key, but even so, it would still only be around 300 random reads from the base table, and that would not take very much longer than 30 or 40 random reads of the base table.

    If I had to make a guess based on the incomplete information given here, I would guess that the job that runs enough slower to warrant raising this question is actually accessing a lot more than 300 rows from the base table and has to access them
    randomly because the primary key does not have ID as its most significant column. The job that runs faster is still doing the random reads, but is accessing a much smaller number of rows. That guess might be wrong, so keep in mind that it is only my
    guess.

    That does not answer the question of why SQL computes a significantly different cost for the query in the two jobs. If my guess that the two jobs are not accessing the same table is not the answer for that part of puzzle, then I'd have to agree with
    Randall that the difference could have something to do with the statistics on the table, but I don't know enough about the optimizer to guess what is most likely to have gone wrong with the statistics.

    If the primary key of the table actually is (ID, CUSTOMER), then I think the plan should not be using the alternate index at all, and the reason it is using the alternate index would need to be addressed. Maybe something gone wrong with the statistics
    could have caused that, but I don't know enough to say what it could be.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Randall@21:1/5 to rkd...@gmail.com on Sat Oct 1 12:20:05 2022
    On Saturday, October 1, 2022 at 1:37:21 a.m. UTC-4, rkd...@gmail.com wrote:
    On Friday, September 30, 2022 at 12:38:54 PM UTC-7, Randall wrote:
    On Friday, September 30, 2022 at 2:28:52 a.m. UTC-4, Memmedaga Memmedov wrote:
    Hi,
    You can understand my problem from subject; "sql cost changing from day to day"
    I have a sql statement in cobol program. Yesterday sqlcost was 1030, today is 117. So my programs sometimes run slowly.
    I will share two outputs of explain comments. What is the reason? Thanks.
    TABLEID has 2 primary key, ID and CUSTOMER.
    TABLEID has index ID and DATE.

    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    ------------------------------------------------------------------------------
    Plan step 1 ------------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1, sequential cache SBB for reads : Virtual, double buffer
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0035% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0035% of rows from table
    Expected row count: 335 rows after the scan
    Operation cost : 1030

    Total cost : 1030


    explain
    select * from =TABLEID
    where ID=12345678912
    browse access;

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    Query plan 1
    SQL request : Select <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

    -----------------------------------------------------------------------------
    Plan step 1 -----------------------------------------------------------------------------

    Operation 1.0 : Scan
    Table : \SYS1.$DISK1.SUBVOL.TABLEID
    with correlation name TABLEID
    Access type : No locks, browse access
    Lock mode : Chosen by the system
    Column processing : Requires retrieval of 50 out of 50 columns

    Access path 1 : Alternate \SYS1.$DISK1.SUBVOL.INDEX1
    SBB for reads : Real
    Begin key pred. : ID = 12345678912
    End key pred. : ID = 12345678912
    Index selectivity : Expect to examine 0.0004% of rows from index
    Index pred. : None
    Base table pred. : None

    Executor pred. : None
    Table selectivity : Expect to select 0.0004% of rows from table
    Expected row count: 37 rows after the scan
    Operation cost : 117

    Total cost : 117

    How often do you update statistics? Are the results changing before and after statistics are updated? What is the primary key on the table? Are there statistics on the primary key as well as the index?
    I assume that when you wrote "TABLEID has 2 primary key, ID and CUSTOMER", you meant that the primary key contains two columns named ID and CUSTOMER. Is that correct? A table cannot have two primary keys., so I don't know what else you could have meant.


    Are the two runs actually accessing the same table? Or is the DEFINE =TABLEID pointing to different tables for the two runs?

    Can you tell whether the number of rows accessed from the base table actually are approximately the numbers estimated by the EXPLAIN output (about 335 in one case and about 37 in the other)?

    Even if the number of rows accessed is about 335 vs. about 37, I would not expect either program to take enough time to run that you would feel the need to say that one of the runs is slow, unless the program performs an awful lot of work on each row
    that it accesses. What is the actual run time of the program in the two cases?

    I imagine that the value of ID that the two queries are selecting is different in the two jobs, and so it is correct that the number of rows that match the queries are different. Is that true, or are you getting a different number of rows actually
    selected when you run the same query twice?

    I never needed to use EXPLAIN plans very much, so I don't remember very much about interpreting them. But I have a feeling that these plans say that it is using the alternate index on ID rather than the initial part of the primary key (ID, CUSTOMER) in
    the scan. If the primary key actually is on (CUSTOMER, ID), then that would make sense, since the rows of the base table would not be grouped by ID. That would mean it would scan the index on ID, and for each row of the index that matched the value of ID,
    it would have to do a random read of the base table, jumping all around the base table rather than being able to sequentially read a subset of rows that are all together from the base table. That random access will make the program run slower than if
    the base table had ID as the most significant part of the primary key, but even so, it would still only be around 300 random reads from the base table, and that would not take very much longer than 30 or 40 random reads of the base table.

    If I had to make a guess based on the incomplete information given here, I would guess that the job that runs enough slower to warrant raising this question is actually accessing a lot more than 300 rows from the base table and has to access them
    randomly because the primary key does not have ID as its most significant column. The job that runs faster is still doing the random reads, but is accessing a much smaller number of rows. That guess might be wrong, so keep in mind that it is only my
    guess.

    That does not answer the question of why SQL computes a significantly different cost for the query in the two jobs. If my guess that the two jobs are not accessing the same table is not the answer for that part of puzzle, then I'd have to agree with
    Randall that the difference could have something to do with the statistics on the table, but I don't know enough about the optimizer to guess what is most likely to have gone wrong with the statistics.

    If the primary key of the table actually is (ID, CUSTOMER), then I think the plan should not be using the alternate index at all, and the reason it is using the alternate index would need to be addressed. Maybe something gone wrong with the statistics
    could have caused that, but I don't know enough to say what it could be.

    My supposition is that the optimizer is picking the alternate index because the keys are fitting into a convenient set of blocks and would be faster to process than reading by primary key. That's partly where unspecified statistics can both mess up plans
    and provide misleading and inaccurate costs. The only true measure of performance is actually running the query with MEASURE monitoring the I/Os and SQL operations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)