• Computing a new Variable: 5 categorical variables into 1 categorical va

    From Joy Nico@21:1/5 to All on Thu Apr 2 06:31:11 2020
    Hi :)

    I want to do a cross-sectional analysis
    1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.

    2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-
    Test but did not find out, how I can compute it seperately for different values in my group-variable.

    3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.

    Now to my question:

    I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.

    My domains are now in 5 variables as follows:
    Aspect_1_t19
    Aspect_2_t19
    Aspect_3_t19
    Aspect_4_t19
    Aspect_5_t19

    Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
    Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).

    I do not want to add them up, since the info would then be lost. I tried the code provided from spss but it did lead to error messages and absurdly high values instead of a list or string of variables:
    https://www.spss-tutorials.com/combine-categorical-variables/

    I would be very grateful for some tips/advise!

    Joy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Joy Nico on Thu Apr 2 08:36:20 2020
    On Thursday, April 2, 2020 at 9:31:14 AM UTC-4, Joy Nico wrote:
    Hi :)

    I want to do a cross-sectional analysis
    1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.

    2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-
    Test but did not find out, how I can compute it seperately for different values in my group-variable.

    3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.

    Now to my question:

    I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.

    My domains are now in 5 variables as follows:
    Aspect_1_t19
    Aspect_2_t19
    Aspect_3_t19
    Aspect_4_t19
    Aspect_5_t19

    Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
    Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).


    What do the values 1-19 represent? Are they scores of some kind? Are means & SDs sensible?

    --- snip the rest ---

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Thu Apr 2 13:55:45 2020
    On Thu, 2 Apr 2020 06:31:11 -0700 (PDT), Joy Nico <joy.tieg@gmail.com>
    wrote:

    Hi :)

    I want to do a cross-sectional analysis
    1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.

    2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-
    Test but did not find out, how I can compute it seperately for different values in my group-variable.

    3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.

    Now to my question:

    I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.

    My domains are now in 5 variables as follows:
    Aspect_1_t19
    Aspect_2_t19
    Aspect_3_t19
    Aspect_4_t19
    Aspect_5_t19

    Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
    Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).

    I do not want to add them up, since the info would then be lost. I tried the code provided from spss but it did lead to error messages and absurdly high values instead of a list or string of variables:
    https://www.spss-tutorials.com/combine-categorical-variables/

    I would be very grateful for some tips/advise!


    Here is my guess as to what your data looks like.
    You do have age, which you have successfully recoded into
    two new variables for groups.

    You have 5 variables which show what participants
    consider important, among 19 "domains". For each person,
    Aspect_1 ranks as their most important, Aspect_5 ranks 5th.

    You can do Freq to see counts for each.
    You can do Mult-response to see counts for each, and
    for counts per age group -- but you can't do statistical
    testing on those numbers because they are not independent.

    You can do a meaningful chi-squared contingency test
    on Age_gr by Aspect_1, "most important". It is not very
    meaningful to consider the same test for Aspect_2 to _5.
    This test might lack statistical power if some of the 19
    domains have very low counts. In that case, you might
    consider combining some categories, creating a new variable
    using RECODE ... / into= ... .

    If you want to see testing about the 19 domains, the obvious
    route is to create 19 new variables, one for each domain.

    That would be something like this (untested) --

    comment create 19 variables with format as F2.
    vector Dom(19, F2).
    Loop # = 1 to 19.
    compute Dom(#) = ANY( #, Aspect_1_t19 to Aspect_5_t19).
    end loop.

    value labels Dom1 to Dom19 "0" no "1" yes.
    var labels Dom1 "family" ....etc....

    Then you can do testing on whether age groups are
    similar in their profiles for how often they used each
    domain, comparing each one of the 19.

    crosstabs vars= age_gr by dom1 to dom19.

    You pointed to the "Combining variables" help that was
    not useful to you. I would consider using that technique
    for creating a new variable using only a few of the most
    frequent among the 0/1 variables. If there are patterns
    that could be interesting.



    Of course, if I have misunderstood what your basic data
    look like, please post a revised description.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joy Nico@21:1/5 to All on Fri Apr 3 01:51:48 2020
    Hi Rich Ulrich,
    Thank you for your reply!

    I used an altered version of the SEIQoL-DW to asses Quality of Life. The SEIQoL-DW consists of 3 steps, I am using the first step (and in other analyses the overall score).
    In the first step the respondents chose their 5 most important aspects (in any order, so aspect 1 does not necessarily need to be the most important).
    This is unfortunate now, since it means, that I can just look overall, which domains do occur the most often in the entire sample and whithin different groups (you are right, I did compute new variables containing a coding for age).
    This is why I would love to be able to compute a new variable, which contains all 5 Aspects for each respondent in a list or something. So I could see the frequency of a domain at 1 timepoint over all 5 answers/aspects a respondent gave.

    Did I make my data-structure and problems stemming from it clear? If not, I gladly try again :)

    I will think more about the option, you provided, the problem I see with it, is that percentage and everything will be for only one variable, which is only 1/5th of the answer.
    If you have any further tips, I will take them gladly! :)
    Thank you so much for your time!

    Sincerely,
    Joy Tieg


    Am Donnerstag, 2. April 2020 19:55:53 UTC+2 schrieb Rich Ulrich:
    On Thu, 2 Apr 2020 06:31:11 -0700 (PDT), Joy Nico <joy.tieg@gmail.com>
    wrote:

    Hi :)

    I want to do a cross-sectional analysis
    1st) to see what life domains (categorical variable) people chose. I am thinking of a frequency analysis.

    2nd) to see, what life domains people from different age-groups chose (once in 2 age groups, once in 3 --> so a categorical variable with 2 or 3 values). Still unsure, a frequency analysis seperately for each age-group...? Was trying Binomial and Chi2-
    Test but did not find out, how I can compute it seperately for different values in my group-variable.

    3rd) I want to compare, if age groups do differ significantly in the most often chosen life domains or least chosen respectively. I think a Chi2-Test would be suitable. Unsure yet.

    Now to my question:

    I did generate 1 variable for the age groups, this worked. For the domains it did not work so far: I somehow have to generate a new variable, containing 5 values.

    My domains are now in 5 variables as follows:
    Aspect_1_t19
    Aspect_2_t19
    Aspect_3_t19
    Aspect_4_t19
    Aspect_5_t19

    Each participant has those 5 variables for the aspects (since I wanted to get their 5 most important life aspects).
    Every Aspect-variable contains a value from 1 to 19, representing an aspect (family, friendship, work, finances etc.).

    I do not want to add them up, since the info would then be lost. I tried the code provided from spss but it did lead to error messages and absurdly high values instead of a list or string of variables:
    https://www.spss-tutorials.com/combine-categorical-variables/

    I would be very grateful for some tips/advise!


    Here is my guess as to what your data looks like.
    You do have age, which you have successfully recoded into
    two new variables for groups.

    You have 5 variables which show what participants
    consider important, among 19 "domains". For each person,
    Aspect_1 ranks as their most important, Aspect_5 ranks 5th.

    You can do Freq to see counts for each.
    You can do Mult-response to see counts for each, and
    for counts per age group -- but you can't do statistical
    testing on those numbers because they are not independent.

    You can do a meaningful chi-squared contingency test
    on Age_gr by Aspect_1, "most important". It is not very
    meaningful to consider the same test for Aspect_2 to _5.
    This test might lack statistical power if some of the 19
    domains have very low counts. In that case, you might
    consider combining some categories, creating a new variable
    using RECODE ... / into= ... .

    If you want to see testing about the 19 domains, the obvious
    route is to create 19 new variables, one for each domain.

    That would be something like this (untested) --

    comment create 19 variables with format as F2.
    vector Dom(19, F2).
    Loop # = 1 to 19.
    compute Dom(#) = ANY( #, Aspect_1_t19 to Aspect_5_t19).
    end loop.

    value labels Dom1 to Dom19 "0" no "1" yes.
    var labels Dom1 "family" ....etc....

    Then you can do testing on whether age groups are
    similar in their profiles for how often they used each
    domain, comparing each one of the 19.

    crosstabs vars= age_gr by dom1 to dom19.

    You pointed to the "Combining variables" help that was
    not useful to you. I would consider using that technique
    for creating a new variable using only a few of the most
    frequent among the 0/1 variables. If there are patterns
    that could be interesting.



    Of course, if I have misunderstood what your basic data
    look like, please post a revised description.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Joy Nico on Fri Apr 3 07:12:04 2020
    On Friday, April 3, 2020 at 4:51:51 AM UTC-4, Joy Nico wrote:
    Hi Rich Ulrich,
    Thank you for your reply!

    I used an altered version of the SEIQoL-DW to asses Quality of Life.

    If Rich (or anyone else) has time to help you, it might help them to have some information about how to score this thing. E.g.,

    https://www.researchgate.net/publication/237753111_Schedule_for_the_Evaluation_of_Individual_Quality_of_Life_SEIQoL_a_Direct_Weighting_procedure_for_Quality_of_Life_Domains_SEIQoL-DW

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joy Nico@21:1/5 to All on Fri Apr 3 08:53:32 2020
    Thank you for the advise! I will keep that in mind, for the case that I have to come back again. It seems that I have found a solution: I needed to transform my dataset, so that 1 person had several lines. In that changed set I had to compute a new
    variable.... Thank you all for the help!

    Am Freitag, 3. April 2020 16:12:07 UTC+2 schrieb Bruce Weaver:
    On Friday, April 3, 2020 at 4:51:51 AM UTC-4, Joy Nico wrote:
    Hi Rich Ulrich,
    Thank you for your reply!

    I used an altered version of the SEIQoL-DW to asses Quality of Life.

    If Rich (or anyone else) has time to help you, it might help them to have some information about how to score this thing. E.g.,

    https://www.researchgate.net/publication/237753111_Schedule_for_the_Evaluation_of_Individual_Quality_of_Life_SEIQoL_a_Direct_Weighting_procedure_for_Quality_of_Life_Domains_SEIQoL-DW

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to All on Sat Apr 4 00:03:33 2020
    On Fri, 3 Apr 2020 08:53:32 -0700 (PDT), Joy Nico <joy.tieg@gmail.com>
    wrote:

    Thank you for the advise! I will keep that in mind, for the case that I have to come back again. It seems that I have found a solution: I needed to transform my dataset, so that 1 person had several lines. In that changed set I had to compute a new
    variable.... Thank you all for the help!

    The big problem with that solution is that you have 5
    lines for every person, so every tabulation has 5 times
    the count for the total number of people. - You can
    ask for statistical tests which will highlight which
    differences are highest, but they are not reportable
    as valid tests.

    That also does not get you any "combination" of domains
    which you were interested in.

    I'll repeat what I suggested -
    If you use "Mult response", you can get the fraction
    of responses "per person" as well as "per total responses."

    I've looked at the source Bruce cited. The "most frequently
    used" seems to offer a start for looking at combination of
    domains. Computing combinations will also be made
    easier if you follow what I suggested by creating 19
    new variables, No/Yes for each of the 19 domains.

    Combine just 3 (say) of them at a time. Use the most
    frequently mentioned domains.

    If a particular domain is relevant to age, you might
    use that, instead. Or start with the 3 most relevant to
    age.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)