• SPSS equivalent of bysort and _n in Stata

    From aisha.siddiqui@berkeley.edu@21:1/5 to All on Tue Mar 10 11:05:09 2020
    Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having
    the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Tue Mar 10 16:54:19 2020
    On Tue, 10 Mar 2020 11:05:09 -0700 (PDT), aisha.siddiqui@berkeley.edu
    wrote:

    Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having
    the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

    Do you know that no cases have more than one Pre
    or Post record? - duplicated, or otherwise?

    For a relatively well-composed file, the simplest
    cleaning that I think of is to use AGGREGATE on ID,
    ADDing a new variable to each line that has the
    Number of cases for the ID; then SELECT to keep
    only those with exactly 2 records. Before Selecting,
    you can do a Freq to find out whether it is true that
    all IDs have either 1 or 2 records, and never more than that.

    For a messier file, without warning about the messiness,
    you can use the LEAD( ) function as I will describe. LEAD(PrePost)
    will return the value of PrePost in the record that comes next,
    in contrast to the LAG( ) function that returns the value of
    a variable from the previous record.

    COMMENT file is sorted by ID and PrePost.
    COMMENT - find a pair that make up two proper lines. Save both.
    COMPUTE ToUse= 0.
    IF (ID eq LEAD(ID) ) and (PrePost=1) and LEAD(PrePost=2) ToUse=1.
    IF (ID eq LAG(ID) ) and (PrePost=2) and LAG(PrePost=1) ToUse= 1.
    SELECT IF ToUse=1.
    Save Outfile blah blah blah.

    If you are using a really old version of SPSS, LEAD( ) might not
    be available. In that case, you could use the LAG( ) line as above,
    then re-Sort the file so that it is Descending order on PrePost,
    then use the LEAD( ) line with each LEAD re-written as LAG( ).
    Then do the Select.

    Totally untested.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to aisha....@berkeley.edu on Tue Mar 10 15:34:18 2020
    On Tuesday, March 10, 2020 at 2:05:13 PM UTC-4, aisha....@berkeley.edu wrote:
    Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having
    the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

    You didn't show some sample data, so it's difficult to know what the possibilities are. E.g., might someone have more than 2 lines? Here's a fairly general approach that should work under most circumstances. You'll have to change variable names as
    needed.

    * Generate some data to illustrate.
    DATA LIST list / ID Time (2F1).
    BEGIN DATA
    1 1
    1 2
    2 1
    3 2
    4 1
    4 1
    5 2
    5 2
    6 1
    6 2
    7 1
    7 1
    7 2
    END DATA.

    AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES
    /BREAK=ID
    /Time_min=MIN(Time)
    /Time_max=MAX(Time)
    /NumRecs=NU.

    COMPUTE ToUse = (NumRecs EQ 2) AND (Time_min EQ 1) and (Time_max EQ 2).
    FORMATS ToUse(F1).
    LIST ID Time ToUse.

    SELECT IF ToUse.
    LIST ID Time ToUse.


    OUTPUT from first LIST command:

    ID Time ToUse

    1 1 1
    1 2 1
    2 1 0
    3 2 0
    4 1 0
    4 1 0
    5 2 0
    5 2 0
    6 1 1
    6 2 1
    7 1 0
    7 1 0
    7 2 0

    Number of cases read: 13 Number of cases listed: 13

    Output from second LIST command:

    ID Time ToUse

    1 1 1
    1 2 1
    6 1 1
    6 2 1

    Number of cases read: 4 Number of cases listed: 4

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)