• Informix Query Tuning

    From manoharsrinivasan.ab@gmail.com@21:1/5 to All on Tue Dec 6 17:08:49 2016
    @Jack, @Mike, I am posting the Explain Plan below:

    QUERY: (OPTIMIZATION TIMESTAMP: 12-06-2016 18:07:07)
    ------
    SELECT {+explain} skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR
    (enrollment_period IN ('FALL14','SPR15','SUM15','FALL15','SPR16','SPR14','FALL16','WIN15','WIN14','SUM14')))
    ORDER BY created_on DESC

    DIRECTIVES FOLLOWED:
    EXPLAIN
    DIRECTIVES NOT FOLLOWED:

    Estimated Cost: 154427
    Estimated # of Rows Returned: 300707

    1) ngp.lead: INDEX PATH

    (1) Index Name: jxadmin.test
    Index Keys: created_on enrollment_period (Key-First) (Reverse) (Serial, fragments: ALL)
    Index Key Filters: ((ngp.lead.enrollment_period IN ('FALL14' , 'SPR15' , 'SUM15' , 'FALL15' , 'SPR16' , 'SPR14' , 'FALL16' , 'WIN15' , 'WIN14' , 'SUM14' )OR ngp.lead.enrollment_period IS NULL ) )


    Query statistics:
    -----------------

    The final cost of the plan is reduced because of the FIRST n specification in
    the query.

    Table map :
    ----------------------------
    Internal name Table name
    ----------------------------
    t1 lead

    type table rows_prod est_rows rows_scan time est_cost
    -------------------------------------------------------------------
    scan t1 100075 300707 100075 00:00.42 154428

    I tried creating the index on enrollment_period and created_on separately, but the query uses the index on created_on only. I am giving the table structure below:

    create table 'ngp'.lead (
    id INT8 not null,
    lead_status_type VARCHAR(10) not null,
    prefix VARCHAR(10),
    first_name VARCHAR(100) not null,
    middle_name VARCHAR(100),
    last_name VARCHAR(100) not null,
    suffix VARCHAR(10),
    address_line1 VARCHAR(100),
    address_line2 VARCHAR(100),
    address_line3 VARCHAR(100),
    city VARCHAR(100),
    state VARCHAR(10),
    country VARCHAR(10),
    postal_code VARCHAR(100),
    email_address VARCHAR(100),
    home_phone VARCHAR(100),
    cell_phone VARCHAR(100),
    ssn VARCHAR(100),
    enrollment_period VARCHAR(10),
    lead_source VARCHAR(10),
    lead_source_referree_id INT8,
    lead_date DATE,
    credential_type VARCHAR(10),
    ceeb_code INT,
    school_id INT8,
    high_sch_grad_year VARCHAR(4),
    intended_program1 VARCHAR(10),
    intended_program2 VARCHAR(10),
    gender_type VARCHAR(10),
    birth_date DATE,
    race_type VARCHAR(10),
    occupation_type VARCHAR(10),
    interest1_type VARCHAR(10),
    interest2_type VARCHAR(10),
    interest3_type VARCHAR(10),
    interest4_type VARCHAR(10),
    interest5_type VARCHAR(10),
    owner_int_org_id INT8,
    lead_response_type VARCHAR(10),
    response_date DATETIME YEAR TO FRACTION(5),
    candidate_id INT8,
    created_by VARCHAR(100),
    updated_by VARCHAR(100),
    created_on DATETIME YEAR TO FRACTION(5),
    updated_on DATETIME YEAR TO FRACTION(5),
    version INT8 default 0 not null
    )

    Please let me know if you have any other suggestions for this.

    Thanks
    Manohar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From manoharsrinivasan.ab@gmail.com@21:1/5 to All on Tue Dec 6 17:13:52 2016
    Mike,

    I am skip..first clause for pagination as the user may even scroll to skip the data upto 300000 records to get the next 75 records. There is only one table involved. Do you think creating temp table will help. If so, kindly explain how I can do this.

    Please let me know how I can use PDQ and scroll cursor to optimize this query.

    Regards
    Manohar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Walker@21:1/5 to All on Tue Dec 6 09:10:16 2016
    To: informix-list@iiug.org

    Without digging into the distributions, the index that would may work best
    for this query is on (enrollment_period, created_on desc), but would need
    more info.

    I would suggest that you post the query plan.

    Make sure that you have PDQ set.

    Are there many records where enrollment_period is null? You could try
    turning this into a union, but that would cause complications with the skip/first constructs and may require the use of a temp table.

    I would guess that you are using the "skip 300000 first 75" to page through
    the results set. There may be better ways to do this that would involve a scroll cursor or putting the key fields in a temp table.

    Mike



    -----Original Message-----
    From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org]
    On Behalf Of manoharsrinivasan.ab@gmail.com
    Sent: Tuesday, December 06, 2016 8:48 AM
    To: informix-list@iiug.org
    Subject: Informix Query Tuning

    Hi

    I have a table called lead, which have about 500 thousand records and we
    need the following query to get executed.

    SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL)
    OR (enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY
    created_on DESC

    The table lead has id column as the primary key and thus have clustered
    index in that column. This query is taking about 12 - 13 mins. When I added
    a non-clustered index on created_on and enrollment_period columns, it came
    down to 4 - 5 mins. Then I changed the clustered index from id column to
    this index, execution time came down further to about 50 seconds now.

    Is there any other optimization scope available for this query? Overall, is there any other change that can be done so that the query will execute
    faster?

    Thanks in Advance,
    Manohar
    _______________________________________________
    Informix-list mailing list
    Informix-list@iiug.org
    http://members.iiug.org/mailman/listinfo/informix-list

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jack Parker@21:1/5 to manoharsrinivasan.ab@gmail.com on Tue Dec 6 11:03:45 2016
    I would suggest getting the explain plan

    "SET EXPLAIN ON;" before running your query to see what the engine is doing.

    We know nothing about the DDL for this table or your application. That
    info would be helpful, you could probably fragment the table by
    something or other to improve performance. But without know what is in
    the table...

    Why don't you have an index on enrollment_period by itself? That would
    seem to be the simplest thing.

    How often is this query run?

    Regards,
    Jack Parker

    On 12/6/16 10:47 AM, manoharsrinivasan.ab@gmail.com wrote:
    Hi

    I have a table called lead, which have about 500 thousand records and we need the following query to get executed.

    SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR (enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY created_on DESC

    The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I
    changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.

    Is there any other optimization scope available for this query? Overall, is there any other change that can be done so that the query will execute faster?

    Thanks in Advance,
    Manohar
    _______________________________________________
    Informix-list mailing list
    Informix-list@iiug.org
    http://members.iiug.org/mailman/listinfo/informix-list


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From manoharsrinivasan.ab@gmail.com@21:1/5 to All on Tue Dec 6 07:47:43 2016
    Hi

    I have a table called lead, which have about 500 thousand records and we need the following query to get executed.

    SELECT skip 300000 first 75 * FROM lead WHERE ((enrollment_period IS NULL) OR (enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17'))) ORDER BY created_on DESC

    The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I
    changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.

    Is there any other optimization scope available for this query? Overall, is there any other change that can be done so that the query will execute faster?

    Thanks in Advance,
    Manohar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)