• RTRIM -Simplying syntax so my computer doesnt crash

    From Erin Holloway@21:1/5 to All on Wed Jul 29 21:30:24 2020
    My data looks a bit like this:
    sp_counts_1
    1|2|4
    1
    4
    5|7
    8|9|2|4

    I just need the first number and want to delete the rest. I have been running this syntax to extract the number before the vertical slash '|'. Whilst it works it is crashing my computer due to the large dataset (500,000 cases).

    Due to the high number of variables I prefer to replace the exsiting variables (1000's, but split into files) rather than make new ones.

    It is getting stuck on the 'list' and never makes it through.

    Any ideas on how I could simplify the syntax so its happier to run?
    Thanks in advance

    STRING new1 to new10 (A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #L = CHAR.INDEX(old,"|") - 1.
    IF #L EQ -1 #L = LENGTH(RTRIM(old)).
    COMPUTE new = CHAR.SUBSTR(old,1,#L).
    END REPEAT.
    LIST.
    Delete variables sp_counts_1 TO sp_counts_10.
    RENAME VARIABLES (new1 to new10 = sp_counts_1 TO sp_counts_10).
    Execute.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Ulrich@21:1/5 to erin.psychology@gmail.com on Thu Jul 30 03:07:11 2020
    On Wed, 29 Jul 2020 21:30:24 -0700 (PDT), Erin Holloway <erin.psychology@gmail.com> wrote:

    My data looks a bit like this:
    sp_counts_1
    1|2|4
    1
    4
    5|7
    8|9|2|4

    I just need the first number and want to delete the rest. I have been running this syntax to extract the number before the vertical slash '|'. Whilst it works it is crashing my computer due to the large dataset (500,000 cases).

    Are you saying that, Yes, it does work when there
    are few lines, or when you specify LIST CASES TO 100. ?

    The first legitimate failure I think of is running out of disc.
    That seems abnormal and wrong. 500 000 cases is no longer
    too huge to process.

    What are the symptoms of your crashes?


    Due to the high number of variables I prefer to replace the exsiting variables (1000's, but split into files) rather than make new ones.

    Thousands of variables to list on one line? That might cause
    some upset if you are writing a HUGE format across. The
    default used to be to WRAP, which I learned to avoid.


    It is getting stuck on the 'list' and never makes it through.


    Any ideas on how I could simplify the syntax so its happier to run?
    Thanks in advance

    STRING new1 to new10 (A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #L = CHAR.INDEX(old,"|") - 1.
    IF #L EQ -1 #L = LENGTH(RTRIM(old)).
    COMPUTE new = CHAR.SUBSTR(old,1,#L).
    END REPEAT.
    LIST.
    Delete variables sp_counts_1 TO sp_counts_10.
    RENAME VARIABLES (new1 to new10 = sp_counts_1 TO sp_counts_10).
    Execute.

    I wonder why you are showing us the Delete vars and Rename.
    Once upon a time, LIST was not a procedure; it set a switch to LIST
    after a procedure caused cases to be read. If your SPSS is that old,
    then just running the sytax down through LIST will do nothing --
    SPSS will sit there waiting for a procedure or EXE.

    If your SPSS is that old, I don't know what happens when variables
    are renamed between LIST and EXE, since LIST in that case would not
    be "performed" in the order written in syntax.

    For a newer SPSS, please describe the symptoms of "crash".


    I find the below variaton of syntax easier to read.
    * append | to the value so that one is always found.
    STRING #temp(A11) , new1 to new10(A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #temp= CONCAT( RTRIM(old), '|' ) .
    COMPUTE #L = CHAR.INDEX(#temp,"|") - 1.
    COMPUTE new = CHAR.SUBSTR(#temp,1,#L).
    END REPEAT.

    COMMENT + will concatenate in file names, maybe not in COMPUTE.
    COMMENT If not, use the concatenation function to combine.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruce Weaver@21:1/5 to Erin Holloway on Thu Jul 30 06:12:22 2020
    On Thursday, July 30, 2020 at 12:30:27 AM UTC-4, Erin Holloway wrote:
    My data looks a bit like this:
    sp_counts_1
    1|2|4
    1
    4
    5|7
    8|9|2|4

    I just need the first number and want to delete the rest. I have been running this syntax to extract the number before the vertical slash '|'. Whilst it works it is crashing my computer due to the large dataset (500,000 cases).

    Due to the high number of variables I prefer to replace the exsiting variables (1000's, but split into files) rather than make new ones.

    It is getting stuck on the 'list' and never makes it through.

    Any ideas on how I could simplify the syntax so its happier to run?
    Thanks in advance

    STRING new1 to new10 (A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #L = CHAR.INDEX(old,"|") - 1.
    IF #L EQ -1 #L = LENGTH(RTRIM(old)).
    COMPUTE new = CHAR.SUBSTR(old,1,#L).
    END REPEAT.
    LIST.
    Delete variables sp_counts_1 TO sp_counts_10.
    RENAME VARIABLES (new1 to new10 = sp_counts_1 TO sp_counts_10).
    Execute.

    Hi Erin. Why would you not make your new variable numeric rather than string? E.g.,

    NEW FILE.
    DATASET CLOSE ALL.
    DATA LIST LIST / sp_counts_1 sp_counts_2 (2A10).
    BEGIN DATA
    "1|2|4" "2|3|5"
    "1" "2"
    "4" "5"
    "5|7" "6|8"
    "8|9|2|4" "9|0|3|5"
    "10|11|12" "13"
    END DATA.
    LIST.

    RENAME VARIABLES (sp_counts_1 TO sp_counts_2 = old1 to old2).
    DO REPEAT old = old1 TO old2 / new = sp_counts_1 TO sp_counts_2.
    COMPUTE #L = CHAR.INDEX(old,"|") - 1.
    IF #L EQ -1 #L = LENGTH(RTRIM(old)).
    COMPUTE new = NUMBER(CHAR.SUBSTR(old,1,#L),F5.0).
    END REPEAT.
    FORMATS sp_counts_1 TO sp_counts_2 (F5.0).
    LIST.
    DELETE VARIABLES old1 to old2.

    Output from the final LIST command:

    old1 old2 sp_counts_1 sp_counts_2

    1|2|4 2|3|5 1 2
    1 2 1 2
    4 5 4 5
    5|7 6|8 5 6
    8|9|2|4 9|0|3|5 8 9
    10|11|12 13 10 13


    Number of cases read: 6 Number of cases listed: 6

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erin Holloway@21:1/5 to Rich Ulrich on Wed Aug 19 22:17:58 2020
    Sorry for my late response, I have had some time off.
    Thanks for that Rich that works perfectly.

    With regards to your questions- the 1000’s of variables are split into files, so wont impact…but just don’t want to rename them all. Also why I included the delete/rename syntax.

    Bruce, can be numeric, no need for string for the new variable.

    I have another one you guys might be able to help with...but ill post a new string.

    Erin




    On Thursday, July 30, 2020 at 5:07:18 PM UTC+10, Rich Ulrich wrote:
    On Wed, 29 Jul 2020 21:30:24 -0700 (PDT), Erin Holloway <erin.ps...@gmail.com> wrote:

    My data looks a bit like this:
    sp_counts_1
    1|2|4
    1
    4
    5|7
    8|9|2|4

    I just need the first number and want to delete the rest. I have been running this syntax to extract the number before the vertical slash '|'. Whilst it works it is crashing my computer due to the large dataset (500,000 cases).
    Are you saying that, Yes, it does work when there
    are few lines, or when you specify LIST CASES TO 100. ?

    The first legitimate failure I think of is running out of disc.
    That seems abnormal and wrong. 500 000 cases is no longer
    too huge to process.

    What are the symptoms of your crashes?

    Due to the high number of variables I prefer to replace the exsiting variables (1000's, but split into files) rather than make new ones.
    Thousands of variables to list on one line? That might cause
    some upset if you are writing a HUGE format across. The
    default used to be to WRAP, which I learned to avoid.

    It is getting stuck on the 'list' and never makes it through.


    Any ideas on how I could simplify the syntax so its happier to run?
    Thanks in advance

    STRING new1 to new10 (A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #L = CHAR.INDEX(old,"|") - 1.
    IF #L EQ -1 #L = LENGTH(RTRIM(old)).
    COMPUTE new = CHAR.SUBSTR(old,1,#L).
    END REPEAT.
    LIST.
    Delete variables sp_counts_1 TO sp_counts_10.
    RENAME VARIABLES (new1 to new10 = sp_counts_1 TO sp_counts_10).
    Execute.
    I wonder why you are showing us the Delete vars and Rename.
    Once upon a time, LIST was not a procedure; it set a switch to LIST
    after a procedure caused cases to be read. If your SPSS is that old,
    then just running the sytax down through LIST will do nothing --
    SPSS will sit there waiting for a procedure or EXE.

    If your SPSS is that old, I don't know what happens when variables
    are renamed between LIST and EXE, since LIST in that case would not
    be "performed" in the order written in syntax.

    For a newer SPSS, please describe the symptoms of "crash".


    I find the below variaton of syntax easier to read.
    * append | to the value so that one is always found.
    STRING #temp(A11) , new1 to new10(A10).
    DO REPEAT old = sp_counts_1 TO sp_counts_10 / new = new1 TO new10.
    COMPUTE #temp= CONCAT( RTRIM(old), '|' ) .
    COMPUTE #L = CHAR.INDEX(#temp,"|") - 1.
    COMPUTE new = CHAR.SUBSTR(#temp,1,#L).
    END REPEAT.

    COMMENT + will concatenate in file names, maybe not in COMPUTE.
    COMMENT If not, use the concatenation function to combine.

    --
    Rich Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)