• computing new variable with lots of 'IF's

    From Erin Holloway@21:1/5 to All on Wed Aug 19 23:08:06 2020
    So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other
    variable names change the same way _1 to _30.
    I assume its a loop? not sure how to write it though?
    The variables are all numeric

    DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
    COMPUTE Participation_1 = $SYSMIS.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
    COMPUTE Participation_1 = 0.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
    COMPUTE Participation_1 = 1.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 2.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 3.
    END IF.
    EXECUTE.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Erin Holloway on Thu Aug 20 06:03:44 2020
    On Thursday, August 20, 2020 at 2:08:08 AM UTC-4, Erin Holloway wrote:
    So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other
    variable names change the same way _1 to _30.
    I assume its a loop? not sure how to write it though?
    The variables are all numeric

    DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
    COMPUTE Participation_1 = $SYSMIS.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
    COMPUTE Participation_1 = 0.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
    COMPUTE Participation_1 = 1.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 2.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 3.
    END IF.
    EXECUTE.

    Hello Erin. To answer one of your questions, you can use a DO-REPEAT structure to loop through the 30 variables.

    Regarding your DO-IF structure, I found myself wondering if it was going to work the way you intended. E.g., I wondered if you needed more parentheses, as follows:

    ELSE IF (a EQ value) OR (b eq value) OR (c EQ value).

    If it is that basic structure each time, you can use the ANY function instead, like this:

    ELSE IF ANY(a EQ value, b EQ value, c EQ value).

    With that in mind, I wondered if this is close to what you want:

    DO REPEAT
    cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_30 /
    ce = CurrentlyEmployed_1 to CurrentlyEmployed_30 /
    lw = LookingForWork_1 to LookingForWork_30 /
    part = Participation_1 to Participation_30.
    DO IF NMISS(cs, ce, lfw) EQ 3.
    COMPUTE part = -999. /* Better not to use $SYSMIS.
    ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 6.
    COMPUTE part = 0.
    ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 9.
    COMPUTE part = 1.
    ELSE IF ANY(ce EQ 2 AND cs EQ 0, ce EQ 1 AND cs EQ 0, ce EQ 0 AND cs EQ 1).
    COMPUTE part = 2.
    ELSE IF ANY(ce EQ 2 AND cs EQ 1, ce EQ 1 AND cs EQ 1, ce EQ 3 AND cs EQ 1).
    COMPUTE part = 3.
    END IF.
    END REPEAT.
    MISSING VALUES Participation_1 to Participation_30 (-999).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to erin.psychology@gmail.com on Thu Aug 20 15:09:06 2020
    - corrected version of my reply, with Study*4 in place of Study*3.

    On Wed, 19 Aug 2020 23:08:06 -0700 (PDT), Erin Holloway <erin.psychology@gmail.com> wrote:

    So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other
    variable names change the same way _1 to _30.
    I assume its a loop? not sure how to write it though?
    The variables are all numeric

    DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
    COMPUTE Participation_1 = $SYSMIS.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
    COMPUTE Participation_1 = 0.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
    COMPUTE Participation_1 = 1.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 2.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 3.
    END IF.
    EXECUTE.

    I agree with Bruce, that you need parentheses.
    Damned if I know how SPSS will parse a long string
    of AND and OR statements,

    But, instead of all that --
    I suggest that you create a temporary variable so that
    you have a logical table that combines Emp and Study,

    #Temp= Study*4 + Emp, yields values in this table -
    Emp 0 1 2 3
    Study 0 0 1 2 3
    Study 1 4 5 6 7

    (That does depend on not-having values out of those ranges.)

    Then recode,
    RECODE #Temp(0=0), (1,2,4= 2)(else=3) INTO= partic.

    Better, you could name the values explicitly so that 3,5,6,7 will be
    partic=3, and use ELSE to define some version of MISSING.

    This leaves the simple test for whether partic=0 should
    remain 0 or be set to 1, depending on the value of Look.
    *partic stays set of 0 if Look is 6 -- or anything but 9.

    IF (partic=0) AND (Look = 9) partic=1.

    Make sure of the MISSING (which the Recode of #Temp using
    ELSE may have changed), and you are set.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to erin.psychology@gmail.com on Thu Aug 20 15:02:08 2020
    On Wed, 19 Aug 2020 23:08:06 -0700 (PDT), Erin Holloway <erin.psychology@gmail.com> wrote:


    So I have these overly complicated IF statements and I need to run to compute a new variable. But I need to do the same computation for 30 occasions. So basically I need to end up with 30 new variables- Participation_1 to Participation_30. The other
    variable names change the same way _1 to _30.
    I assume its a loop? not sure how to write it though?
    The variables are all numeric

    DO IF (MISSING(CurrentlyStudying_status_1) AND MISSING(CurrentlyEmployed_1) AND MISSING(LookingForWork_1)).
    COMPUTE Participation_1 = $SYSMIS.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 6).
    COMPUTE Participation_1 = 0.
    ELSE IF (CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 0 AND LookingForWork_1 = 9).
    COMPUTE Participation_1 = 1.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 0
    OR CurrentlyEmployed_1 = 0 AND CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 2.
    ELSE IF (CurrentlyEmployed_1 = 2 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 1 AND CurrentlyStudying_status_1 = 1
    OR CurrentlyEmployed_1 = 3 OR CurrentlyStudying_status_1 = 1).
    COMPUTE Participation_1 = 3.
    END IF.
    EXECUTE.

    I agree with Bruce, that you need parentheses.
    Damned if I know how SPSS will parse a long string
    of AND and OR statements,

    But, instead of all that --
    I suggest that you create a temporary variable so that
    you have a logical table that combines Emp and Study,

    #Temp= Study*3 + Emp, yields values in this table -
    Emp 0 1 2 3
    Study 0 0 1 2 3
    Study 1 4 5 6 7

    (That does depend on not-having values out of those ranges.)

    Then recode,
    RECODE #Temp(0=0), (1,2,4= 2)(else=3) INTO= partic.

    Better, you could name the values explicitly so that 3,5,6,7 will be
    partic=3, and use ELSE to define some version of MISSING.

    This leaves the simple test for whether partic=0 should
    remain 0 or be set to 1, depending on the value of Look.
    *partic stays set of 0 if Look is 6 -- or anything but 9.

    IF (partic=0) AND (Look = 9) partic=1.

    Make sure of the MISSING (which the Recode of #Temp using
    ELSE may have changed), and you are set.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to All on Thu Aug 20 21:10:14 2020
    Thanks to you both

    @Bruce I have run the syntax, it runs fine...but coded 22.4% as '2' and the rest as either Missing -999 or system missing. No 0's 1's or 3's.
    I cannot work out why. I will keep having a play with it.

    @Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one?

    Variables are:
    CurrentlyEmployed (0,1,2,3)
    CurrentlyStudying_status (0,1,2)
    LookingForWork_status (0,1)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to All on Thu Aug 20 21:34:18 2020
    Found the error, I had messed up the calc for one of the variables.
    So great! thank you

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to All on Thu Aug 20 22:17:50 2020
    Thanks to you both

    @Bruce I have run the syntax, I can get the syntax to run, but it is not coding the '3''s correctly. I assume its because of the ce=3 and cs=2 being included in the 'ANY' command and needs to be separated?

    ce EQ 2 AND cs EQ 1 or
    ce EQ 1 AND cs EQ 1 or
    ce EQ 3 or
    cs EQ 2.

    DO REPEAT
    cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_10 /
    ce = CurrentlyEmployed_1 to CurrentlyEmployed_10 /
    lw = LookingForWork_status_1 to LookingForWork_status_10 /
    part = Participation_1 to Participation_10.
    DO IF NMISS(cs, ce, lw) EQ 3.
    COMPUTE part = -999.
    ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 0.
    COMPUTE part = 0.
    ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 1.
    COMPUTE part = 1.
    ELSE IF ANY(ce EQ 2 AND cs EQ 0, ce EQ 1 AND cs EQ 0, ce EQ 0 AND cs EQ 1). COMPUTE part = 2.

    ELSE IF ANY(ce EQ 2 AND cs EQ 1, ce EQ 1 AND cs EQ 1, ce EQ 3, cs EQ 2). COMPUTE part = 3.

    END IF.
    END REPEAT.
    MISSING VALUES Participation_1 to Participation_10 (-999).


    @Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one? Im keen to give it a go.

    Variables are:
    CurrentlyEmployed (0,1,2,3)
    CurrentlyStudying_status (0,1,2)
    LookingForWork_status (0,1)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to erin.psychology@gmail.com on Fri Aug 21 14:11:31 2020
    On Thu, 20 Aug 2020 22:17:50 -0700 (PDT), Erin Holloway <erin.psychology@gmail.com> wrote:


    @Rich, that makes sense to me, and I have created new variables that have values only in the required range for the temp table, but I am not familar with a logical table or how to create one? Im keen to give it a go.

    What I wrote before --

    #Temp= Study*4 + Emp, yields values in this table -
    Emp 0 1 2 3
    Study 0 0 1 2 3
    Study 1 4 5 6 7

    #Temp has all the combinations that you care about
    except for the split needed for for (emp=0, Study=0) --

    RECODE #Temp into a new var with desired codes.
    Then split out the (0,0) instances according to the third variable.


    By the way -- I see a simplification available in what you
    posted for the solution with many IFs -

    ELSE IF ce EQ 0 AND cs EQ 0 and lw EQ 0.
    COMPUTE part = 0.
    ELSE IF ce EQ 0 AND cs EQ 0 AND lw EQ 1.
    COMPUTE part = 1.

    Assuming "lw" is defined as 0 or 1 (always), that says:

    ELSE IF ce EQ 0 AND cs EQ 0 .
    COMPUTE part = lw .

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Erin Holloway on Fri Aug 21 11:50:39 2020
    On Friday, August 21, 2020 at 1:17:53 AM UTC-4, Erin Holloway wrote:
    Thanks to you both

    --- snip ---
    Variables are:
    CurrentlyEmployed (0,1,2,3)
    CurrentlyStudying_status (0,1,2)
    LookingForWork_status (0,1)

    Here is another variation on Rich's theme which *might* be a bit easier to follow. I start with a self-contained version to demonstrate the concept--hence the "toy" dataset.

    * Generate a toy dataset with all combinations of those variables.
    NEW FILE.
    DATA LIST LIST / ce cs lw (3F1.0).
    BEGIN DATA
    0 0 0
    0 0 1
    0 0 .
    0 1 0
    0 1 1
    0 1 .
    0 2 0
    0 2 1
    0 2 .
    0 . 0
    0 . 1
    0 . .
    1 0 0
    1 0 1
    1 0 .
    1 1 0
    1 1 1
    1 1 .
    1 2 0
    1 2 1
    1 2 .
    1 . 0
    1 . 1
    1 . .
    2 0 0
    2 0 1
    2 0 .
    2 1 0
    2 1 1
    2 1 .
    2 2 0
    2 2 1
    2 2 .
    2 . 0
    2 . 1
    2 . .
    3 0 0
    3 0 1
    3 0 .
    3 1 0
    3 1 1
    3 1 .
    3 2 0
    3 2 1
    3 2 .
    3 . 0
    3 . 1
    3 . .
    . 0 0
    . 0 1
    . 0 .
    . 1 0
    . 1 1
    . 1 .
    . 2 0
    . 2 1
    . 2 .
    . . 0
    . . 1
    . . .
    END DATA.

    FREQUENCIES ce cs lw.

    * [1] Recode SYSMIS to 9, but do not (yet) treat it as missing.
    * This will allow step 2 below to work properly.

    RECODE ce c
  • From Rich Ulrich@21:1/5 to bweaver@lakeheadu.ca on Mon Aug 24 15:18:04 2020
    On Fri, 21 Aug 2020 11:50:39 -0700 (PDT), Bruce Weaver
    <bweaver@lakeheadu.ca> wrote:

    On Friday, August 21, 2020 at 1:17:53 AM UTC-4, Erin Holloway wrote:
    Thanks to you both

    --- snip ---
    Variables are:
    CurrentlyEmployed (0,1,2,3)
    CurrentlyStudying_status (0,1,2)
    LookingForWork_status (0,1)

    Here is another variation on Rich's theme which *might* be a bit easier to follow. I start with a self-contained version to demonstrate the concept--hence the "toy" dataset.

    * Generate a toy dataset with all combinations of those variables.
    NEW FILE.
    DATA LIST LIST / ce cs lw (3F1.0).
    BEGIN DATA
    0 0 0
    0 0 1
    0 0 .
    0 1 0
    0 1 1
    0 1 .
    0 2 0
    0 2 1
    0 2 .
    0 . 0
    0 . 1
    0 . .
    1 0 0
    1 0 1
    1 0 .
    1 1 0
    1 1 1
    1 1 .
    1 2 0
    1 2 1
    1 2 .
    1 . 0
    1 . 1
    1 . .
    2 0 0
    2 0 1
    2 0 .
    2 1 0
    2 1 1
    2 1 .
    2 2 0
    2 2 1
    2 2 .
    2 . 0
    2 . 1
    2 . .
    3 0 0
    3 0 1
    3 0 .
    3 1 0
    3 1 1
    3 1 .
    3 2 0
    3 2 1
    3 2 .
    3 . 0
    3 . 1
    3 . .
    . 0 0
    . 0 1
    . 0 .
    . 1 0
    . 1 1
    . 1 .
    . 2 0
    . 2 1
    . 2 .
    . . 0
    . . 1
    . . .
    END DATA.

    FREQUENCIES ce cs lw.

    * [1] Recode SYSMIS to 9, but do not (yet) treat it as missing.
    * This will allow step 2 below to work properly.

    RECODE ce cs lw (SYSMIS=9).
    FREQUENCIES ce cs lw.

    * [2] Combine the 3 variables into a single variable.

    COMPUTE cecslw = ce*100 + cs*10 + lw.
    FORMATS cecslw (F3.0).
    FREQUENCIES cecslw.

    * [3] RECODE cecslw into new variable Part.
    * You'll have to fill in the value lists and values according to your rules. >RECODE cecslw
    (999 = 999)
    (value list = value)
    (value list = value)
    etc.
    (value list = value) INTO Part.
    FORMATS Part (F5.0).
    MISSING VALUES Part (999) / ce cs lw (9).
    FREQUENCIES Part.

    * --------------------------------------.

    Now here it is inserted in a DO-REPEAT structure that allows you to loop through your 30 variables. Notice that I have changed variable cecslw to #cecslw. The # makes it a scratch variable that will not be written to your dataset. This bit is
    untested, as I do not have your dataset. But I think it ought to work. But as before, you have to fill in the details of the RECODE command, because only you know the rules for that.


    DO REPEAT
    cs = CurrentlyStudying_status_1 to CurrentlyStudying_status_30 /
    ce = CurrentlyEmployed_1 to CurrentlyEmployed_30 /
    lw = LookingForWork_1 to LookingForWork_30 /
    part = Participation_1 to Participation_30.

    RECODE ce cs lw (SYSMIS=9).
    COMPUTE #cecslw = ce*100 + cs*10 + lw.
    RECODE #cecslw
    (999 = 999)
    (value list = value)
    (value list = value)
    etc.
    (value list = value) INTO Part.
    END REPEAT.
    MISSING VALUES Participation_1 to Participation_30 (999).


    HTH.

    Bruce,
    You are right, it is more transparent to create the temporary
    variable usling multiples of 10.

    For this problem, the "lw" is used only for one decision, so I
    would probably ignore it at first, and just ...
    COMPUTE #cecs= ce*10 + cs.

    That gives legal values of 0, 10, 20, 30, 1, 11, 21, 31, --
    which are easy to interpret directly on reading. Any other values
    must be bad data (presumably, no correction was not done earlier).

    All in all, the question of what is "missing" and how they should be
    dealt with is more taxing than the original classification problem.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to All on Wed Sep 2 16:24:39 2020
    Alright, its working! I have re-calculated the variables so css and ces = 0,1,2 and lws = 0,1. So its clearer with the culculation which is which.
    Thank you so so much. This has been driving me nuts for ages.
    Erin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Erin Holloway on Thu Sep 3 07:28:12 2020
    On Wednesday, September 2, 2020 at 7:24:42 PM UTC-4, Erin Holloway wrote:
    Alright, its working! I have re-calculated the variables so css and ces = 0,1,2 and lws = 0,1. So its clearer with the culculation which is which.
    Thank you so so much. This has been driving me nuts for ages.
    Erin

    Good to hear. Please consider posting your code so that others who find this discussion later on can see the finished product.

    Cheers,
    Bruce

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to All on Thu Sep 3 15:16:43 2020
    No worries. I did the 'Looking or work variable separately' as I couldnt make the ce*100 + cs*10 + lw work.
    Thanks again.

    DO REPEAT
    css = CurrentlyStudying_status_1 to CurrentlyStudying_status_10 /
    ces = CurrentlyEmployed_status_1 to CurrentlyEmployed_status_10 /
    part = Participation_1 to Participation_10.
    RECODE css ces (SYSMIS=-999).
    COMPUTE #cecs = ces*10 + css.
    RECODE #cecs
    (0=0) (10=2) (1=2) (2=3) (11 thru Highest =3) INTO Part.
    END REPEAT.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)