• Finding a number in a string in SPSS

    From erin.psychology@gmail.com@21:1/5 to All on Thu Jan 30 19:56:37 2020
    I have a two string variables (DiagnosisPrimary & DiagnosisOther) that have diagnosis coded into numerical values and delineated with '~' .
    So person one was diagnosed with '20' on their first visit and then nothing on their second visit.

    DiagnosisPrimary
    20~
    12~~~
    20~20~
    12~12~
    12~~
    ~~~~~
    12~12

    If 20 represents having been diagnoised with depression, I want to see if there is a '20' among the string. Basically, find out who was diagnosed with dpression at any of their visits. In the past I have used the formula below, but now the data is a mix
    of numerical and text it is no longer working.

    COMPUTE Dx_Depression =ANY(20, DiagnosisPrimary) OR ANY(20, DiagnosisOther).

    I want to create a new variable with a true/false response.

    Any help much appreicated!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to don't remember if the documentation on Fri Jan 31 00:59:31 2020
    On Thu, 30 Jan 2020 19:56:37 -0800 (PST), erin.psychology@gmail.com
    wrote:

    I have a two string variables (DiagnosisPrimary & DiagnosisOther) that have diagnosis coded into numerical values and delineated with '~' .
    So person one was diagnosed with '20' on their first visit and then nothing on their second visit.

    DiagnosisPrimary
    20~
    12~~~
    20~20~
    12~12~
    12~~
    ~~~~~
    12~12

    If 20 represents having been diagnoised with depression, I want to see if there is a '20' among the string. Basically, find out who was diagnosed with dpression at any of their visits. In the past I have used the formula below, but now the data is a mix
    of numerical and text it is no longer working.

    COMPUTE Dx_Depression =ANY(20, DiagnosisPrimary) OR ANY(20, DiagnosisOther). Why not two logical comparisons where you use ANY( ) ?
    COMPUTE Dx_Depression=
    (20 eq DiagnosisPrimary) or (20 eq DiagnosisOther).

    The original also could be simplified by writing
    COMPUTE Dx_Depression = ANY(20, DiagnosisPrimary, DiagnosisOther).

    ANY( ) tests if the first item matches whatever follows.
    One value followed by several variables is common.
    One variable followed by several values is common.

    I've never had cause to check other possibilities, and I
    don't remember if the documentation says anything.


    I want to create a new variable with a true/false response.

    Any help much appreicated!

    It sounds like the only difference is that you now have
    strings instead of numbers.

    So you would write ANY("20", ...) .

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From erin.psychology@gmail.com@21:1/5 to All on Thu Feb 6 13:58:37 2020
    Thanks Rich,
    Tried:
    COMPUTE Dx_Depression =ANY("20", DiagnosisPrimary, DiagnosisOther).

    It did run fine, but did not find the 20's from within the string.

    Any other tricks? Do you think I will need to remove the "~"?

    Erin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From erin.psychology@gmail.com@21:1/5 to All on Thu Feb 6 14:02:35 2020
    This also works, however it will take far to long to run on the big data set...

    COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or (CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or (CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
    (CHAR.INDEX(DiagnosisOther, '~20~') >0) or
    (CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or (CHAR.INDEX(DiagnosisOther, '20~')=1).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to erin.ps...@gmail.com on Thu Feb 6 14:41:55 2020
    On Thursday, February 6, 2020 at 5:02:37 PM UTC-5, erin.ps...@gmail.com wrote:
    This also works, however it will take far to long to run on the big data set...

    COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or (CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or (CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
    (CHAR.INDEX(DiagnosisOther, '~20~') >0) or
    (CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or (CHAR.INDEX(DiagnosisOther, '20~')=1).

    Does this work?

    COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?

    STRING NewDx (A50).
    COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
    COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    When you are done with variable NewDx, delete it if you wish.

    HTH.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From erin.psychology@gmail.com@21:1/5 to Bruce Weaver on Thu Feb 6 16:04:11 2020
    Ah thats great Bruce!
    Also cutes down the code I need for the other diagnoses.
    Thanks all




    On Friday, February 7, 2020 at 9:41:57 AM UTC+11, Bruce Weaver wrote:
    On Thursday, February 6, 2020 at 5:02:37 PM UTC-5, erin.ps...@gmail.com wrote:
    This also works, however it will take far to long to run on the big data set...

    COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or (CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or (CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
    (CHAR.INDEX(DiagnosisOther, '~20~') >0) or
    (CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or (CHAR.INDEX(DiagnosisOther, '20~')=1).

    Does this work?

    COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?

    STRING NewDx (A50).
    COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
    COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    When you are done with variable NewDx, delete it if you wish.

    HTH.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to bweaver@lakeheadu.ca on Fri Feb 7 01:27:46 2020
    On Thu, 6 Feb 2020 14:41:55 -0800 (PST), Bruce Weaver
    <bweaver@lakeheadu.ca> wrote:

    On Thursday, February 6, 2020 at 5:02:37 PM UTC-5, erin.ps...@gmail.com wrote: >> This also works, however it will take far to long to run on the big data set...

    COMPUTE Dx_Depression = (CHAR.INDEX(DiagnosisPrimary, '~20~') >0) or
    (CHAR.INDEX(DiagnosisPrimary, '~20')=char.len(DiagnosisPrimary)-3) or
    (CHAR.INDEX(DiagnosisPrimary, '20~')=1) or
    (CHAR.INDEX(DiagnosisOther, '~20~') >0) or
    (CHAR.INDEX(DiagnosisOther, '~20')=char.len(DiagnosisOther)-3) or
    (CHAR.INDEX(DiagnosisOther, '20~')=1).

    Does this work?

    COMPUTE Dx_Depression = SUM(CHAR.INDEX(DiagnosisPrimary,'20'),CHAR.INDEX(DiagnosisOther,'20')) GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    Or are there cases you have not shown us where '20' appears in ways that do not indicate depression. E.g., are there codes like ~201~ or ~120~ that would louse things up? If so, how about this variation on the theme?

    I think I was distracted by the strange use of ANY( ) , so I
    failed to respond to the example of the STRINGs to be searched.
    - Sorry -


    STRING NewDx (A50).
    COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~").
    COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    When you are done with variable NewDx, delete it if you wish.

    I like this solution because I figure that concatenating and
    using one CHAR.INDEX( ) search is both easier to read and
    faster to execute that doing two searches.

    I'd probably use the name TempDx for NewDx, to reduce
    the chance that it gets included in other places.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Rich Ulrich on Fri Feb 7 11:46:29 2020
    On Friday, February 7, 2020 at 1:27:52 AM UTC-5, Rich Ulrich wrote:

    --- snip ---
    STRING NewDx (A50).
    COMPUTE NewDX = CONCAT("~",DiagnosisPrimary,"~",DiagnosisOther,"~"). >COMPUTE Dx_Depression = CHAR.INDEX(NewDx, '~20~') GT 0.
    FORMATS Dx_Depression (F1).
    FREQUENCIES Dx_Depression.

    When you are done with variable NewDx, delete it if you wish.

    I like this solution because I figure that concatenating and
    using one CHAR.INDEX( ) search is both easier to read and
    faster to execute that doing two searches.


    Good point about speed, Rich. I suspect you're right.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)