• grouping and selecting cases?

    From Bruce Weaver@21:1/5 to Rich Ulrich on Thu Jun 6 08:09:43 2019
    On Thursday, June 6, 2019 at 12:55:05 AM UTC-4, Rich Ulrich wrote:
    On Wed, 5 Jun 2019 06:58:35 -0700 (PDT), Elyse Cottrell-Martin <ecottrel@lakeheadu.ca> wrote:

    I have stacked data that I want to do a few things with.

    For most people they have 3 sets of data (before, right after, and a follow up for an event) but some do not.

    example:

    ID# Date Score Time
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup


    I only want to include people who have a before, after, AND followup data point (so in the above I would want to exclude ID#2 and ID#4 because they only have two (some people only have 1, and some people have 3 but they're not the right time code, for
    example they have an After and 2 followups). I have over 3000 lines of data though so I need syntax or how to do this using the GUI.

    I think this is a two step process? Grouping IDs and then selecting cases? I know how to group data using aggregate, and I know how to select cases, but I can't seem to figure out how to combine these things.

    Help is greatly appreciated, thank you.

    I might do this as two steps:
    Save to a file the ID#'s that are complete;
    Do a /TABLE= <orig> match to pick up those cases.

    The new file will have extra Followup, if those occurs.
    You can get rid of them with LAG if you want.

    This is all untested. It assumes that there is never more than
    one "After" for an ID#, and that the file is sorted in logical date
    order (Before, after, followup). And that the capitalization is
    always correct.


    DO IF (time eq "Followup") and
    + (lag(ID#) = ID# ) and (lag( ID#,2) = ID#).

    DO IF (lag(Time) eq "After") and (lag(Time,2) eq "Before").
    SAVE outfile= <IDfile>/vars= ID#.
    END IF.
    END IF.

    MATCH FILES file= <IDfile name>/table= * / by= ID#/ vars= all.

    --
    Rich Ulrich

    I cannot currently see the original post. But here is what comes to mind for me.


    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / ID (F2.0) Date(ADATE) Score (F5.1) Time(A8).
    BEGIN DATA
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup
    END DATA.

    * Compute 0/1 indicator variables for the 3 time points.
    COMPUTE t1 = Time EQ "Before".
    COMPUTE t2 = Time EQ "After".
    COMPUTE t3 = Time EQ "Followup".
    FORMATS t1 to t3 (F1).

    * Write max values of indicator to each row per ID.
    AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
    /BREAK=ID
    /t1 t2 t3 = MAX(t1,t2,t3).

    * Keep records where sum of the indicators = 3.
    SELECT IF SUM(t1 to t3) EQ 3.
    LIST.
    DELETE VARIABLES t1 to t3.

    Output from LIST:

    ID Date Score Time t1 t2 t3

    1 01/02/2015 1.2 Before 1 1 1
    1 01/03/2015 1.5 After 1 1 1
    1 02/05/2015 1.3 Followup 1 1 1
    3 01/08/2015 1.5 Before 1 1 1
    3 02/05/2015 .2 After 1 1 1
    3 02/10/2015 1.1 Followup 1 1 1


    Number of cases read: 6 Number of cases listed: 6

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ki@21:1/5 to Bruce Weaver on Thu Jun 6 11:02:12 2019
    On Thursday, June 6, 2019 at 10:09:46 AM UTC-5, Bruce Weaver wrote:
    On Thursday, June 6, 2019 at 12:55:05 AM UTC-4, Rich Ulrich wrote:
    On Wed, 5 Jun 2019 06:58:35 -0700 (PDT), Elyse Cottrell-Martin <ecottrel@lakeheadu.ca> wrote:

    I have stacked data that I want to do a few things with.

    For most people they have 3 sets of data (before, right after, and a follow up for an event) but some do not.

    example:

    ID# Date Score Time
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup


    I only want to include people who have a before, after, AND followup data point (so in the above I would want to exclude ID#2 and ID#4 because they only have two (some people only have 1, and some people have 3 but they're not the right time code,
    for example they have an After and 2 followups). I have over 3000 lines of data though so I need syntax or how to do this using the GUI.

    I think this is a two step process? Grouping IDs and then selecting cases? I know how to group data using aggregate, and I know how to select cases, but I can't seem to figure out how to combine these things.

    Help is greatly appreciated, thank you.

    I might do this as two steps:
    Save to a file the ID#'s that are complete;
    Do a /TABLE= <orig> match to pick up those cases.

    The new file will have extra Followup, if those occurs.
    You can get rid of them with LAG if you want.

    This is all untested. It assumes that there is never more than
    one "After" for an ID#, and that the file is sorted in logical date
    order (Before, after, followup). And that the capitalization is
    always correct.


    DO IF (time eq "Followup") and
    + (lag(ID#) = ID# ) and (lag( ID#,2) = ID#).

    DO IF (lag(Time) eq "After") and (lag(Time,2) eq "Before").
    SAVE outfile= <IDfile>/vars= ID#.
    END IF.
    END IF.

    MATCH FILES file= <IDfile name>/table= * / by= ID#/ vars= all.

    --
    Rich Ulrich

    I cannot currently see the original post. But here is what comes to mind for me.


    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / ID (F2.0) Date(ADATE) Score (F5.1) Time(A8).
    BEGIN DATA
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup
    END DATA.

    * Compute 0/1 indicator variables for the 3 time points.
    COMPUTE t1 = Time EQ "Before".
    COMPUTE t2 = Time EQ "After".
    COMPUTE t3 = Time EQ "Followup".
    FORMATS t1 to t3 (F1).

    * Write max values of indicator to each row per ID.
    AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
    /BREAK=ID
    /t1 t2 t3 = MAX(t1,t2,t3).

    * Keep records where sum of the indicators = 3.
    SELECT IF SUM(t1 to t3) EQ 3.
    LIST.
    DELETE VARIABLES t1 to t3.

    Output from LIST:

    ID Date Score Time t1 t2 t3

    1 01/02/2015 1.2 Before 1 1 1
    1 01/03/2015 1.5 After 1 1 1
    1 02/05/2015 1.3 Followup 1 1 1
    3 01/08/2015 1.5 Before 1 1 1
    3 02/05/2015 .2 After 1 1 1
    3 02/10/2015 1.1 Followup 1 1 1


    Number of cases read: 6 Number of cases listed: 6


    This code will also work.
    Ki

    ***************************************************************************** *This is from short form to long to filter only those with 3 points of data. *****************************************************************************.

    CASESTOVARS
    /ID=id
    /GROUPBY=VARIABLE.

    select if not missing(score.1) and not missing(score.2) and not missing(score.3).
    execute.

    VARSTOCASES
    /MAKE date FROM date.1 TO date.3
    /MAKE score FROM score.1 TO score.3
    /MAKE time FROM time.1 TO time.3
    /KEEP id .


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Thu Jun 6 22:19:26 2019
    On Thu, 6 Jun 2019 11:02:12 -0700 (PDT), Ki <ki.h.park@gmail.com>
    wrote:

    On Thursday, June 6, 2019 at 10:09:46 AM UTC-5, Bruce Weaver wrote:
    On Thursday, June 6, 2019 at 12:55:05 AM UTC-4, Rich Ulrich wrote:
    On Wed, 5 Jun 2019 06:58:35 -0700 (PDT), Elyse Cottrell-Martin
    <ecottrel@lakeheadu.ca> wrote:

    I have stacked data that I want to do a few things with.

    For most people they have 3 sets of data (before, right after, and a follow up for an event) but some do not.

    example:

    ID# Date Score Time
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup


    I only want to include people who have a before, after, AND followup data point (so in the above I would want to exclude ID#2 and ID#4 because they only have two (some people only have 1, and some people have 3 but they're not the right time code,
    for example they have an After and 2 followups). I have over 3000 lines of data though so I need syntax or how to do this using the GUI.

    I think this is a two step process? Grouping IDs and then selecting cases? I know how to group data using aggregate, and I know how to select cases, but I can't seem to figure out how to combine these things.

    Help is greatly appreciated, thank you.

    I might do this as two steps:
    Save to a file the ID#'s that are complete;
    Do a /TABLE= <orig> match to pick up those cases.

    The new file will have extra Followup, if those occurs.
    You can get rid of them with LAG if you want.

    This is all untested. It assumes that there is never more than
    one "After" for an ID#, and that the file is sorted in logical date
    order (Before, after, followup). And that the capitalization is
    always correct.


    DO IF (time eq "Followup") and
    + (lag(ID#) = ID# ) and (lag( ID#,2) = ID#).

    DO IF (lag(Time) eq "After") and (lag(Time,2) eq "Before").
    SAVE outfile= <IDfile>/vars= ID#.
    END IF.
    END IF.

    MATCH FILES file= <IDfile name>/table= * / by= ID#/ vars= all.

    --
    Rich Ulrich

    I cannot currently see the original post. But here is what comes to mind for me.


    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / ID (F2.0) Date(ADATE) Score (F5.1) Time(A8).
    BEGIN DATA
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup
    END DATA.

    * Compute 0/1 indicator variables for the 3 time points.
    COMPUTE t1 = Time EQ "Before".
    COMPUTE t2 = Time EQ "After".
    COMPUTE t3 = Time EQ "Followup".
    FORMATS t1 to t3 (F1).

    * Write max values of indicator to each row per ID.
    AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
    /BREAK=ID
    /t1 t2 t3 = MAX(t1,t2,t3).

    * Keep records where sum of the indicators = 3.
    SELECT IF SUM(t1 to t3) EQ 3.
    LIST.
    DELETE VARIABLES t1 to t3.

    Output from LIST:

    ID Date Score Time t1 t2 t3

    1 01/02/2015 1.2 Before 1 1 1
    1 01/03/2015 1.5 After 1 1 1
    1 02/05/2015 1.3 Followup 1 1 1
    3 01/08/2015 1.5 Before 1 1 1
    3 02/05/2015 .2 After 1 1 1
    3 02/10/2015 1.1 Followup 1 1 1


    Number of cases read: 6 Number of cases listed: 6


    This code will also work.
    Ki

    ***************************************************************************** >*This is from short form to long to filter only those with 3 points of data. >*****************************************************************************.

    CASESTOVARS
    /ID=id
    /GROUPBY=VARIABLE.

    select if not missing(score.1) and not missing(score.2) and not missing(score.3).
    execute.

    VARSTOCASES
    /MAKE date FROM date.1 TO date.3
    /MAKE score FROM score.1 TO score.3
    /MAKE time FROM time.1 TO time.3
    /KEEP id .


    My solution keeps only the first Followup when the case is good
    but has extra Followups. An obvious choice, but not the only one.

    Bruce's solution keeps all Followups, which could be handy in
    order to make alternate choices later on, but it is an odd file
    organization -- it sort of requires selection, before anything
    further is done.

    I don't know what CASESTOVARS does when it has a duplicated
    Followup (different date). I suspect it might toss out an error
    or warning about file order, or else use the last Followup.

    The OP gives no instructions/preference for that occurrence.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ki@21:1/5 to Rich Ulrich on Fri Jun 7 11:04:48 2019
    On Thursday, June 6, 2019 at 9:19:31 PM UTC-5, Rich Ulrich wrote:
    On Thu, 6 Jun 2019 11:02:12 -0700 (PDT), Ki <ki.h.park@gmail.com>
    wrote:

    On Thursday, June 6, 2019 at 10:09:46 AM UTC-5, Bruce Weaver wrote:
    On Thursday, June 6, 2019 at 12:55:05 AM UTC-4, Rich Ulrich wrote:
    On Wed, 5 Jun 2019 06:58:35 -0700 (PDT), Elyse Cottrell-Martin
    <ecottrel@lakeheadu.ca> wrote:

    I have stacked data that I want to do a few things with.

    For most people they have 3 sets of data (before, right after, and a follow up for an event) but some do not.

    example:

    ID# Date Score Time
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup


    I only want to include people who have a before, after, AND followup data point (so in the above I would want to exclude ID#2 and ID#4 because they only have two (some people only have 1, and some people have 3 but they're not the right time code,
    for example they have an After and 2 followups). I have over 3000 lines of data though so I need syntax or how to do this using the GUI.

    I think this is a two step process? Grouping IDs and then selecting cases? I know how to group data using aggregate, and I know how to select cases, but I can't seem to figure out how to combine these things.

    Help is greatly appreciated, thank you.

    I might do this as two steps:
    Save to a file the ID#'s that are complete;
    Do a /TABLE= <orig> match to pick up those cases.

    The new file will have extra Followup, if those occurs.
    You can get rid of them with LAG if you want.

    This is all untested. It assumes that there is never more than
    one "After" for an ID#, and that the file is sorted in logical date
    order (Before, after, followup). And that the capitalization is
    always correct.


    DO IF (time eq "Followup") and
    + (lag(ID#) = ID# ) and (lag( ID#,2) = ID#).

    DO IF (lag(Time) eq "After") and (lag(Time,2) eq "Before").
    SAVE outfile= <IDfile>/vars= ID#.
    END IF.
    END IF.

    MATCH FILES file= <IDfile name>/table= * / by= ID#/ vars= all.

    --
    Rich Ulrich

    I cannot currently see the original post. But here is what comes to mind for me.


    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / ID (F2.0) Date(ADATE) Score (F5.1) Time(A8).
    BEGIN DATA
    1 01/02/2015 1.2 Before
    1 01/03/2015 1.5 After
    1 02/05/2015 1.3 Followup
    2 01/06/2015 2.3 Before
    2 01/08/2015 1.4 After
    3 01/08/2015 1.5 Before
    3 02/05/2015 0.2 After
    3 02/10/2015 1.1 Followup
    4 03/20/2015 0.3 Followup
    4 03/25/2015 1.2 Followup
    END DATA.

    * Compute 0/1 indicator variables for the 3 time points.
    COMPUTE t1 = Time EQ "Before".
    COMPUTE t2 = Time EQ "After".
    COMPUTE t3 = Time EQ "Followup".
    FORMATS t1 to t3 (F1).

    * Write max values of indicator to each row per ID.
    AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
    /BREAK=ID
    /t1 t2 t3 = MAX(t1,t2,t3).

    * Keep records where sum of the indicators = 3.
    SELECT IF SUM(t1 to t3) EQ 3.
    LIST.
    DELETE VARIABLES t1 to t3.

    Output from LIST:

    ID Date Score Time t1 t2 t3

    1 01/02/2015 1.2 Before 1 1 1
    1 01/03/2015 1.5 After 1 1 1
    1 02/05/2015 1.3 Followup 1 1 1
    3 01/08/2015 1.5 Before 1 1 1
    3 02/05/2015 .2 After 1 1 1
    3 02/10/2015 1.1 Followup 1 1 1


    Number of cases read: 6 Number of cases listed: 6


    This code will also work.
    Ki

    *****************************************************************************
    *This is from short form to long to filter only those with 3 points of data. >*****************************************************************************.

    CASESTOVARS
    /ID=id
    /GROUPBY=VARIABLE.

    select if not missing(score.1) and not missing(score.2) and not missing(score.3).
    execute.

    VARSTOCASES
    /MAKE date FROM date.1 TO date.3
    /MAKE score FROM score.1 TO score.3
    /MAKE time FROM time.1 TO time.3
    /KEEP id .


    My solution keeps only the first Followup when the case is good
    but has extra Followups. An obvious choice, but not the only one.

    Bruce's solution keeps all Followups, which could be handy in
    order to make alternate choices later on, but it is an odd file
    organization -- it sort of requires selection, before anything
    further is done.

    I don't know what CASESTOVARS does when it has a duplicated
    Followup (different date). I suspect it might toss out an error
    or warning about file order, or else use the last Followup.

    The OP gives no instructions/preference for that occurrence.

    --
    Rich Ulrich

    Hi Rich,

    I think the code will still work for more than 3 assessments but really depends on the criteria to select a valid case. CASESTOVARS basically will create an "n" set of variables for "n" encounters or assessments.

    It will definitely need an update in the syntax if the data structure or selection criteria changes in the second part of the code after CASESTOVARS.

    Ki

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)