Forum: >>> Magnum BBS <<<

SPSS equivalent of bysort and _n in Stata

From aisha.siddiqui@berkeley.edu@21:1/5 to All on Tue Mar 10 11:05:09 2020

Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having
the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to All on Tue Mar 10 16:54:19 2020

On Tue, 10 Mar 2020 11:05:09 -0700 (PDT), aisha.siddiqui@berkeley.edu
wrote:

Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having

the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

Do you know that no cases have more than one Pre
or Post record? - duplicated, or otherwise?

For a relatively well-composed file, the simplest
cleaning that I think of is to use AGGREGATE on ID,
ADDing a new variable to each line that has the
Number of cases for the ID; then SELECT to keep
only those with exactly 2 records. Before Selecting,
you can do a Freq to find out whether it is true that
all IDs have either 1 or 2 records, and never more than that.

For a messier file, without warning about the messiness,
you can use the LEAD( ) function as I will describe. LEAD(PrePost)
will return the value of PrePost in the record that comes next,
in contrast to the LAG( ) function that returns the value of
a variable from the previous record.

COMMENT file is sorted by ID and PrePost.
COMMENT - find a pair that make up two proper lines. Save both.
COMPUTE ToUse= 0.
IF (ID eq LEAD(ID) ) and (PrePost=1) and LEAD(PrePost=2) ToUse=1.
IF (ID eq LAG(ID) ) and (PrePost=2) and LAG(PrePost=1) ToUse= 1.
SELECT IF ToUse=1.
Save Outfile blah blah blah.

If you are using a really old version of SPSS, LEAD( ) might not
be available. In that case, you could use the LAG( ) line as above,
then re-Sort the file so that it is Descending order on PrePost,
then use the LEAD( ) line with each LEAD re-written as LAG( ).
Then do the Select.

Totally untested.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bruce Weaver@21:1/5 to aisha....@berkeley.edu on Tue Mar 10 15:34:18 2020

On Tuesday, March 10, 2020 at 2:05:13 PM UTC-4, aisha....@berkeley.edu wrote:

Hi, I am having a similar issue I think. I would like to remove cases where there is not a Matched ID for both pre and post. I have respondents who have a unique ID. If a respondent has completed a pre and a post, there would be two lines, both having

the same IDCODE and then another variable indicating pre or post. (pre=1 and post=2). How would I remove those who don't have a pre and post and matched ID? Thank you! New to SPSS.

You didn't show some sample data, so it's difficult to know what the possibilities are. E.g., might someone have more than 2 lines? Here's a fairly general approach that should work under most circumstances. You'll have to change variable names as
needed.

* Generate some data to illustrate.
DATA LIST list / ID Time (2F1).
BEGIN DATA
1 1
1 2
2 1
3 2
4 1
4 1
5 2
5 2
6 1
6 2
7 1
7 1
7 2
END DATA.

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/Time_min=MIN(Time)
/Time_max=MAX(Time)
/NumRecs=NU.

COMPUTE ToUse = (NumRecs EQ 2) AND (Time_min EQ 1) and (Time_max EQ 2).
FORMATS ToUse(F1).
LIST ID Time ToUse.

SELECT IF ToUse.
LIST ID Time ToUse.

OUTPUT from first LIST command:

ID Time ToUse

1 1 1
1 2 1
2 1 0
3 2 0
4 1 0
4 1 0
5 2 0
5 2 0
6 1 1
6 2 1
7 1 0
7 1 0
7 2 0

Number of cases read: 13 Number of cases listed: 13

Output from second LIST command:

ID Time ToUse

1 1 1
1 2 1
6 1 1
6 2 1

Number of cases read: 4 Number of cases listed: 4

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Michal Wronka
  Wed Apr 24 14:13:57 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Wed Apr 24 14:02:51 2024
  from Wroclaw, Poland via SSH
- Michal Wronka
  Thu Apr 25 14:02:21 2024
  from Wroclaw, Poland via SSH
- Bob Worm
  Thu Apr 25 11:52:12 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	54:07:10
Calls:	6,650
Calls today:	2
Files:	12,200
Messages:	5,330,613

SPSS equivalent of bysort and _n in Stata

Who's Online

Recent Visitors

System Info