Forum: >>> Magnum BBS <<<

Differences between R and SPSS on probit analysis

From bmf2doa@gmail.com@21:1/5 to All on Thu Feb 23 12:12:41 2017

Hi!!!
I'm working on the effects of alternative larvicides on Aedes aegypti. Right now, I am doing a binary mortality response with a single explanatory variable (dose) on 4 concentrations of one larvicide (+ control). Our university is fond of SPSS, and I
have learned to conduct the basic probit model with it, including a natural logarithm transformation on my dosis data.
Not so long ago, I've started working with R, and through a combination of the 'glm' and 'dose.p' functions, I get the same slope and intercept, as well as LD50 calculations. Nevertheless, the standard errors and Z-scores calculated through the Probit
model in SPSS comes out completely different in R. Additionally, the 95% confidence intervals for the LD50 come out differently between the two programs. I really don't have a clue on how I am getting the same slopes, intercepts and LD50's, but totally
different SE, Z, and 95% CI. Can anybody help me so I can get the same results in R??
I'll pass you the script and hypothetical data:
dose <- c(6000, 4500, 3000, 1500, 0)
total <- c(100, 100, 100, 100, 100)
affected <- c(91, 82, 69, 49, 0)
finney71 <- data.frame(dose, total, affected)
fm1 <- glm(affected/total ~ log(dose),
family=binomial(link = probit), data=finney71[finney71$dose != 0, ])
xp1 <- dose.p(fm1, p=c(0.5,0.9))
xp.ci <- xp1 + attr(xp1, "SE") %*% matrix(qnorm(1 - 0.05/2)*c(-1,1), nrow=1) EAUS.Aa <- exp(cbind(xp1, attr(xp1, "SE"), xp.ci[,1], xp.ci[,2])) dimnames(EAUS.Aa)[[2]] <- c("LD", "SE", "LCL","UCL")
So, this is the regression results I get with R:
summary(fm1)
Deviance Residuals:
1 2 3 4
0.06655 -0.02814 -0.06268 0.03474
Coefficients:
Estimate Std. Error z value
(Intercept) -6.8940 10.7802 -0.640
log(dose) 0.9333 1.3441 0.694

|z|)

(Intercept) 0.522
log(dose) 0.487
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.513878 on 3 degrees of freedom
Residual deviance: 0.010356 on 2 degrees of freedom
AIC: 6.5458
Number of Fisher Scoring iterations: 5
And the LD50 and CI transformed:
print(EAUS.Aa)
LD SE LCL UCL
p = 0.5: 1614.444 3.207876 164.3822 15855.91
p = 0.9: 6373.473 3.764879 474.1600 85669.72
These are the values I get on SPSS (just replacing the values on R output) : Coefficients:
Estimate Std. Error z value
(Intercept) -6.8940 1.082 -6.373
(dose) 2.149 0.311 6.918
And the LD50 and CI transformed:
LD LCL UCL
p = 0.5: 1614.444 1198.932 1953.120
p = 0.9: 6373.473 5145.767 9013.354
So, please if somebody can help me with this, I'd be grateful. If working with those functions won't do it, I'll use another, the one you recommend.
Thank you very much!
PD. I've already googled it but there's no satisfactory answer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Duffy@21:1/5 to All on Fri Feb 24 23:40:55 2017

bmf2doa@gmail.com wrote:
(+ control). Our university is fond of SPSS, and I have learned to
conduct the basic probit model with it

I'll pass you the script and hypothetical data:
dose <- c(6000, 4500, 3000, 1500, 0)
total <- c(100, 100, 100, 100, 100)
affected <- c(91, 82, 69, 49, 0)

R MASS::dose.p

LD SE LCL UCL
p = 0.5: 1614.444 3.207876 164.3822 15855.91

SPSS

LD LCL UCL
p = 0.5: 1614.444 1198.932 1953.120

I'm too lazy to work out where your SPSS programming went wrong, but
with such a small dataset, don't you think the CIs should be wide rather
than narrow?

predict(fm1, newdata=data.frame(dose=1614.444), type="response", se.fit=TRUE) $fit 0.5 $se.fit 0.4339901

or, using some Monte Carlo based CI estimate,
library("HelpersMG")
LD50(finney71[1:4,], equation="probit")
The LD50 is 7.387 SE 1.231
The lower limit of transitional range of doses is 5.625 SE 0.929
The higher limit of transitional range of doses is 9.149 SE 1.278

Incidentally, when we are log transforming a range of values including 0,
it is common to try and include this data point (fudge up to 0.5 or 1).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Ulrich@21:1/5 to davidD@qimr.edu.au on Fri Feb 24 21:11:47 2017

On Fri, 24 Feb 2017 23:40:55 +0000 (UTC), David Duffy
<davidD@qimr.edu.au> wrote:

bmf2doa@gmail.com wrote:
(+ control). Our university is fond of SPSS, and I have learned to
conduct the basic probit model with it

I'll pass you the script and hypothetical data:
dose <- c(6000, 4500, 3000, 1500, 0)
total <- c(100, 100, 100, 100, 100)
affected <- c(91, 82, 69, 49, 0)

R MASS::dose.p

LD SE LCL UCL
p = 0.5: 1614.444 3.207876 164.3822 15855.91

SPSS

LD LCL UCL
p = 0.5: 1614.444 1198.932 1953.120

I'm too lazy to work out where your SPSS programming went wrong, but
with such a small dataset, don't you think the CIs should be wide rather
than narrow?

predict(fm1, newdata=data.frame(dose=1614.444), type="response", se.fit=TRUE) >$fit 0.5 $se.fit 0.4339901

or, using some Monte Carlo based CI estimate,
library("HelpersMG")
LD50(finney71[1:4,], equation="probit")
The LD50 is 7.387 SE 1.231
The lower limit of transitional range of doses is 5.625 SE 0.929
The higher limit of transitional range of doses is 9.149 SE 1.278

Incidentally, when we are log transforming a range of values including 0,
it is common to try and include this data point (fudge up to 0.5 or 1).

Are you saying that the 0 is being ignored in both analyses, since
log(0) is undefined?

I have an ignorant question. Or observation, leading to a question.

I've never run R, and I've never run probit. I Googled to
try to find what "finney71( ) referred to, and that led to an
ambiguous result: An example using R (which I do not use)
seemed to show, in a plot, that the "0" was scored as 0 while
the logarithm was used for the other values. - and there was
a warning that "outlying" values could create non-robust SEs
and models. - and the model would depend on the scaling
of the dose variable, you might note, which /should/ be
a very dubious feature.

Well, for the data above, that would lead to a strong effect:
all the log(dose) values are in a narrow range, compared to 0;
and the 0-dose result is 0 affected, compared to 49%-91%
for the rest. So, by chance, it might fit a fairly linear model.
But, still....

So, question: What does the "Finney71( )" mean? - beyond
the fact that it refers to /something/ in Finney's fine 1971 book
on Probit analysis.

--
Rich Ulrich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Duffy@21:1/5 to Rich Ulrich on Sat Feb 25 06:34:03 2017

Rich Ulrich <rich.ulrich@comcast.net> wrote:

On Fri, 24 Feb 2017 23:40:55 +0000 (UTC), David Duffy
<davidD@qimr.edu.au> wrote:

bmf2doa@gmail.com wrote:
(+ control). Our university is fond of SPSS, and I have learned to
conduct the basic probit model with it

I'll pass you the script and hypothetical data:
dose <- c(6000, 4500, 3000, 1500, 0)
total <- c(100, 100, 100, 100, 100)
affected <- c(91, 82, 69, 49, 0)

R MASS::dose.p

LD SE LCL UCL
p = 0.5: 1614.444 3.207876 164.3822 15855.91

SPSS

LD LCL UCL
p = 0.5: 1614.444 1198.932 1953.120

Are you saying that the 0 is being ignored in both analyses, since
log(0) is undefined?
So, question: What does the "Finney71( )" mean? - beyond

I presume it's an example.

Nothing like Usenet to parade one's mistakes (meaning mine)! I should
downvote myself. The correct R code should have been:
finney71$ldose <- log(dose)
finney71$ldose[dose==0] <- 0

fm1 <- glm(cbind(affected, total-affected) ~ ldose,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
family=binomial(link=probit),data=finney71)
ld50 <- dose.p(fm1)
exp(ld50 + c(0,-1.96,1.96)*attr(ld50,"SE"))
[1] 1614.444 1284.708 2028.811

Closer to the SPSS result. Again, showing my ignorance,
including/excluding the zero dose made no difference if logdose was set
to zero (recall 0/100 affected, so more important if nonzero base rate).

Cheers, David Duffy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Smith@21:1/5 to All on Tue Feb 28 08:30:39 2017

I notice that the results from R are for log-dose. Results from SPSS are for dose. The results have to be different.

Results from R
Estimate Std. Error z value
(Intercept) -6.8940 10.7802 -0.640
log(dose) 0.9333 1.3441 0.694

Results from SPSS
Estimate Std. Error z value
(Intercept) -6.8940 1.082 -6.373
(dose) 2.149 0.311 6.918

Results from R give the same coefficients that I get from Stata (see below) but the standard errors are bigger in R by a factor of 10 and the z values are smaller proportionally.

I believe that the reason is that R is not taking into account the sample sizes that you have for each rate. If R is treating the sample size as 1 instead of 100 then this would result in a standard error bigger by a factor of 10, which is about what you
have compared with Stata.

For the intercept your results show a similar difference between R and SPSS. This suggests that SPSS is accounting for the frequency counts. I was not able to reproduce your results for SPSS.

If you use the proper frequency weights then all your confidence intervals will be shorter including those for the LD-50 estimate.

Does R permit the specification of a frequency weight? If not you will need a data set with 500 observations and lots of duplicates.

Results from Stata for log-dose using logistic regression (400 observations)

Coef. Std. Err. z
constant -6.893981 1.08167 -6.37
lndose .9332907 .1349154 6.92

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Duffy@21:1/5 to David Smith on Tue Feb 28 22:59:02 2017

David Smith <davidwsmithphd@gmail.com> wrote:

I notice that the results from R are for log-dose. Results from SPSS are for dose. The results have to be different.

Results from Stata for log-dose using logistic regression (400 observations)

Coef. Std. Err. z
constant -6.893981 1.08167 -6.37
lndose .9332907 .1349154 6.92

As I pointed out above, the R syntax for the model was incorrect. Once corrected, one obtains for the probit link:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.8940 1.0780 -6.395 1.60e-10 ***
log(dose) 0.9333 0.1344 6.944 3.82e-12 ***

I am guessing you meant probit rather than logistic regression.
The logit link gives
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.4797 1.8312 -6.269 3.64e-10 ***
log(dose) 1.5539 0.2301 6.753 1.45e-11 ***

If the dose are not logged, the coefficients are wildly different from this,
so I agree I don't know exactly what model was fitted in SPSS, and given the
OP never responded...

Cheers, David Duffy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From erburgess83@gmail.com@21:1/5 to bmf...@gmail.com on Thu Jun 22 10:12:15 2017

On Thursday, February 23, 2017 at 2:12:42 PM UTC-6, bmf...@gmail.com wrote:

Hi!!!
I'm working on the effects of alternative larvicides on Aedes aegypti. Right now, I am doing a binary mortality response with a single explanatory variable (dose) on 4 concentrations of one larvicide (+ control). Our university is fond of SPSS, and I

have learned to conduct the basic probit model with it, including a natural logarithm transformation on my dosis data.

Not so long ago, I've started working with R, and through a combination of the 'glm' and 'dose.p' functions, I get the same slope and intercept, as well as LD50 calculations. Nevertheless, the standard errors and Z-scores calculated through the Probit

model in SPSS comes out completely different in R. Additionally, the 95% confidence intervals for the LD50 come out differently between the two programs. I really don't have a clue on how I am getting the same slopes, intercepts and LD50's, but totally
different SE, Z, and 95% CI. Can anybody help me so I can get the same results in R??

I'll pass you the script and hypothetical data:
dose <- c(6000, 4500, 3000, 1500, 0)
total <- c(100, 100, 100, 100, 100)
affected <- c(91, 82, 69, 49, 0)
finney71 <- data.frame(dose, total, affected)
fm1 <- glm(affected/total ~ log(dose),
family=binomial(link = probit), data=finney71[finney71$dose != 0, ])
xp1 <- dose.p(fm1, p=c(0.5,0.9))
xp.ci <- xp1 + attr(xp1, "SE") %*% matrix(qnorm(1 - 0.05/2)*c(-1,1), nrow=1) EAUS.Aa <- exp(cbind(xp1, attr(xp1, "SE"), xp.ci[,1], xp.ci[,2])) dimnames(EAUS.Aa)[[2]] <- c("LD", "SE", "LCL","UCL")
So, this is the regression results I get with R:
summary(fm1)
Deviance Residuals:
1 2 3 4
0.06655 -0.02814 -0.06268 0.03474
Coefficients:
Estimate Std. Error z value
(Intercept) -6.8940 10.7802 -0.640
log(dose) 0.9333 1.3441 0.694

|z|)

(Intercept) 0.522
log(dose) 0.487
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.513878 on 3 degrees of freedom
Residual deviance: 0.010356 on 2 degrees of freedom
AIC: 6.5458
Number of Fisher Scoring iterations: 5
And the LD50 and CI transformed:
print(EAUS.Aa)
LD SE LCL UCL
p = 0.5: 1614.444 3.207876 164.3822 15855.91
p = 0.9: 6373.473 3.764879 474.1600 85669.72
These are the values I get on SPSS (just replacing the values on R output) : Coefficients:
Estimate Std. Error z value
(Intercept) -6.8940 1.082 -6.373
(dose) 2.149 0.311 6.918
And the LD50 and CI transformed:
LD LCL UCL
p = 0.5: 1614.444 1198.932 1953.120
p = 0.9: 6373.473 5145.767 9013.354
So, please if somebody can help me with this, I'd be grateful. If working with those functions won't do it, I'll use another, the one you recommend.
Thank you very much!
PD. I've already googled it but there's no satisfactory answer.

Hopefully you've solved this problem already, but if not, here is the solution.

You've used family=binomial in your model, which fixes the dispersion parameter at 1. Based on your ratio of residual deviance to residual deviance df, you are under-dispersed. The assumption that the dispersion ratio is 1 is what it throwing your
results off. Instead, use family=quasibinomial. This allows the dispersion parameter to be estimated, which in turn affects the variance estimate, SE, etc. Give it a try by running one model with family=binomial and one model with family = quasibinomial.
Notice the SE for the slopes and intercepts. This in turn should change your goodness of fit, and 95% CI's to almost exactly align with the SPSS results, as well as any other pertinent result SPSS spits out. Any minor differences in numbers are likely
due to rounding under the hood of each piece of software and are of no theoretical concern.

Hope that helps you and anybody else having this issue!

Sincerely,

Edwin R. Burgess IV, PhD

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Briels
  Tue Apr 23 20:54:03 2024
  from Uk via SSH
- Cronus
  Tue Apr 23 19:46:51 2024
  from Provo, Ut via SSH
- Keyop
  Tue Apr 23 19:40:37 2024
  from Huddersfield, West Yorkshire via SSH
- Guest
  Wed Apr 24 01:40:10 2024
  from A via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	26:46:25
Calls:	6,646
Calls today:	1
Files:	12,193
Messages:	5,328,054

Differences between R and SPSS on probit analysis

Who's Online

Recent Visitors

System Info