Rich Ulrich:
Glad to find you here! I vaguely remember you were present
in a statiscits newsgroup, but I can't find it now. Would
you be interested in discussing Tom Roberts's statistical
analysis of the Dayton Miller aether-drift experiments? It
requires some light preparatory reading, but the analysis
itself occupies about two pages in Section IV of this
article:
https://arxiv.org/vc/physics/papers/0608/0608238v2.pdf
Since Roberts did not publish his data and code, his
conclusions have zero reproducibility, but I need help in
understanding the procedure and validity of this analysis as
described. If you are interested, could we continue in an
more appropriate newsgroup.
Cross-posted to sci.stat.math
to see if anyone has comments.
On Sat, 25 Feb 2023 00:13:53 +0300, Anton Shepelev
<anton.txt@gmail.moc> wrote:
Rich Ulrich:
<snip, about computers>
you be interested in discussing Tom Roberts's statistical
analysis of the Dayton Miller aether-drift experiments? It
https://arxiv.org/vc/physics/papers/0608/0608238v2.pdf
Since Roberts did not publish his data and code, his
conclusions have zero reproducibility, but I need help in
understanding the procedure and validity of this analysis as
described. If you are interested, could we continue in an
more appropriate newsgroup.
This is a quick and dirty analysis in the R stats package.
Approximate significance of smooth terms:
edf Ref.df F p-value
s(dirs) 4.206 4.206 9.572 1.92e-07
Here dirs is the 16 directions, the edf is the fitted degree of
spline, which when you plot it peaks at 0 and 180 degrees, and
the random effect is a separate intercept for each of the 20
rotations.
Cross-posted to sci.stat.math
to see if anyone has comments.
On Sat, 25 Feb 2023 00:13:53 +0300, Anton Shepelev
<anton.txt@gmail.moc> wrote:
Rich Ulrich:
<snip, about computers>
Glad to find you here! I vaguely remember you were present
in a statiscits newsgroup, but I can't find it now. Would
you be interested in discussing Tom Roberts's statistical
analysis of the Dayton Miller aether-drift experiments? It
requires some light preparatory reading, but the analysis
itself occupies about two pages in Section IV of this
article:
https://arxiv.org/vc/physics/papers/0608/0608238v2.pdf
Since Roberts did not publish his data and code, his
conclusions have zero reproducibility, but I need help in
understanding the procedure and validity of this analysis as
described. If you are interested, could we continue in an
more appropriate newsgroup.
I've cross-posted to a .stat group that has a few readers left.
I read the citation, and I'm not very interested. - I know too little
about the device, etc., or about the ongoing arguments that
apparently exist.
I can say a few things about the paper and the analyses.
Modern statistical analyses and design sophistication for statistics
were barely being born in 1933, when the Miller experiment was
published. In regards to complications and pitfalls, Time series is
worse than analysis of independent points; and what I think of
as 'circular series' (0-360 degrees) is worse than time series. I once
had a passing acquaintance with time series (no data experience)
but I've never touched circular data.
Also, 'messy data' (with big sources of random error) remains a
problem with solutions that are mainly ad-hoc (such as, when
Roberts offers analyses that drop large fractions of the data).
Roberts shows me that these data are so messy that it is hard
to imagine Miller retrieveing a tiny signal from the noise, if Miller
did nothing more than remove linear trends from each cycle. I
would want to know how the DEVICE made all those errors possible,
as a clue to how to exclude their influence on an analysis. If
Miller's data has something, Miller didn't show it right. Roberts
offers an alternative analysis, one that I'm too ignorant to fault.
If you are wondering about how he fit his model, I can say a
little bit. The usual fitting in clinical research (my area) is with least-squares multiple regression, which minimizes the squared
residuals of a fit. The main alternative is Maximum Likelihood,
which finds the maximum likelihood from a Likelihood equation.
That is evaluated by chi-squared ( chisquared= -2*log(likelihood) ).
Roberts seems to be using some version of that, though I didn't
yet figure out what he is fitting.
I thought it was appropriate that he took the consecutive
differences as the main unit of analysis, given how much noise
there was in general. From what I understood of the apparatus,
those are the numbers that are apt to be somewhat usable.
Ending up with a chi-squared value of around 300 for around
300 d.f. is appropriate for showing a suitably fitted model -- the
expected value of X2 by chance for large d.f. is the d.f. A value
much larger indicates poor fit; much smaller indicates over-fit.
I've cross-posted to a .stat group that has a few readers
left.
I read the citation, and I'm not very interested. - I know
too little about the device, etc., or about the ongoing
arguments that apparently exist.
Modern statistical analyses and design sophistication for
statistics were barely being born in 1933, when the Miller
experiment was published. In regards to complications and
pitfalls, Time series is worse than analysis of
independent points; and what I think of as 'circular
series' (0-360 degrees) is worse than time series. I once
had a passing acquaintance with time series (no data
experience) but I've never touched circular data.
Also, 'messy data' (with big sources of random error)
remains a problem with solutions that are mainly ad-hoc
(such as, when Roberts offers analyses that drop large
fractions of the data).
Roberts shows me that these data are so messy that it is
hard to imagine Miller retrieveing a tiny signal from the
noise, if Miller did nothing more than remove linear
trends from each cycle.
I would want to know how the DEVICE made all those errors
possible, as a clue to how to exclude their influence on
an analysis.
If Miller's data has something, Miller didn't show it
right.
If you are wondering about how he fit his model, I can say
a little bit. The usual fitting in clinical research (my
area) is with least-squares multiple regression, which
minimizes the squared residuals of a fit. The main
alternative is Maximum Likelihood, which finds the maximum
likelihood from a Likelihood equation.
That is evaluated by chi-squared
( chisquared= -2*log(likelihood) ).
Roberts seems to be using some version of that, though I
didn't yet figure out what he is fitting.
I thought it /was/ appropriate that he took the
consecutive differences as the main unit of analysis,
given how much noise there was in general. From what I
understood of the apparatus, those are the numbers that
are apt to be somewhat usable.
Ending up with a chi-squared value of around 300 for
around 300 d.f. is appropriate for showing a suitably
fitted model -- the expected value of X2 by chance for
large d.f. is the d.f. A value much larger indicates
poor fit; much smaller indicates over-fit.
The paper is extremely difficult to understand and I have
tried very hard..
There seems a possibility that you are over-interpreting
what the author means by "chi-squared". I have heard some
non-statistical experts in other fields just using "chi-
squared" to mean a sum of squared errors. So not a formal
test-statistic for comparing two models?
The various data-manipulations, both in the original paper
and this one are difficult to follow.
My guess is that some of the stuff in this paper is
throwing-out some information about variability in
whatever "errors" are here.
If this were a simple time series, one mainstream approach
from "time-series analysis" would be to present a spectral
analysis of a detrended and prefiltered version of the
complete timeseries, to try to highlight any remaining
periodicities. There would seem to be a possibility of
extending this to remove other systematic effects.
I think the key point here is to try to separate-out any
isolated frequencies that may be of interest, rather than
to average across a range of neighbouring frequencies, as
may be going on in this paper.
To go any further in understanding this one would need to
have a mathematical description of whatever model is being
used
for the full data-set
together with a proper description of what the various
parameters and error-terms are supposed to mean.
One wonders if an attempt has been made to contact the
author of the Roberts paper, for better information. A
straightforward search in a few steps finds:
Futhermore, any decent scientific article should be
understandable without additional help form the author,
and contrary to J.J. Lodder -- who absurdly forbids me to discuss this
paper "behind the author's back" -- everyone is entitiled and encourated
to discuss published scientific articles without the biasing presence of their authors.
I intended to contact Roberts again, after I had acuqired a better understanding of his model, to be better armed. If we invite Roberts now,
I fear there is going to be much flame and little argument. I am going to
be labeled a "relativity crank" &c. My honest intent now is to forget
about relativity and discuss statistics.
In sci.stat.math David Duffy <davidd02@tpg.com.au> wrote:
This is a quick and dirty analysis in the R stats package.
I was too quick quick in writing this - I needed to unpack those
degrees of freedom into a linear decline over the rotation, due
to the overall drift, which explains most of that signal,
and the actual bump at 180 degrees. If I instead fit a polynomial term,
[http://users.tpg.com.au/davidd02/]
For one formal test, I have fitted a random intercept
model for the (20) rotations, along with a fixed effects
linear decline within the rotation, and then added higher
degree polynomials to show a weakly significant non-linear
term.
I have put the resulting plots up at
http://users.tpg.com.au/davidd02/
I smoothed the trends in the data using localized
regression separately for each time the inferometer was
readjusted,
For one formal test, I have fitted a random intercept
model for the (20) rotations, along with a fixed effects
linear decline within the rotation, and then added higher
degree polynomials to show a weakly significant non-linear
term.
David Duffy:
[http://users.tpg.com.au/davidd02/]
For one formal test, I have fitted a random intercept
model for the (20) rotations, along with a fixed effects
linear decline within the rotation, and then added higher
degree polynomials to show a weakly significant non-linear
term.
I have a question about your plot of detrended data: why do
some rotations start at marker 1 and some at marker 0? This
may have to do with adjustment rotations, and marker 0 is
If this were a simple time series, one mainstream approach
from "time-series analysis" would be to present a spectral
analysis of a detrended and prefiltered version of the
complete timeseries, to try to highlight any remaining
periodicities.
David Jones:
If this were a simple time series, one mainstream approach
from "time-series analysis" would be to present a spectral
analysis of a detrended and prefiltered version of the
complete timeseries, to try to highlight any remaining
periodicities.
The Miller data are a time series in a way,
with the
readings as uniform as the rotation of the device. Would it
be possible to analyse it using the SigSpec algorithm:
https://en.wikipedia.org/wiki/SigSpec
using the eponymous program:
https://arxiv.org/pdf/1006.5081.pdf
The Miller data are a time series in a way,
They are only a time-series because they have been
manipulated in the form of a time-series.
You should not remove real structure in the form of groups
of data
unless you can sure that doing so
(a) does not remove or mask effects you are looking
(b) does not introduce effects of the kind you are
looking for.
with the
readings as uniform as the rotation of the device. Would it
be possible to analyse it using the SigSpec algorithm:
https://en.wikipedia.org/wiki/SigSpec
using the eponymous program:
https://arxiv.org/pdf/1006.5081.pdf
There may be better/more-capable packages available from
time-series analysis specialists.
But you should be aware that any statistical tests would
depend on the validity of the usual assumptions which
would need to be given serious consideration, If you were
planning on doing something depending in a simple way on
FFTs you would need to consider that there is an inherent
assumption that the series being analysed is a good
representative of a stationary process (in terms of the
length of the series being analysed).
Loosely speaking, can you imagine in a general way how the
observed time-series would have behaved before and after
the period supposedly observed. But if we assume the
instrumental drift to be free of any periodicity in turn,
we may discrard spectral components whose frequencies are
not multiples of 1/turn.
The "time-series" in the 2006 paper seems to show a
distinct change in behaviour part way through.
One might consider a logical way forward that doesn't
place heavy reliance on assumptions would be to show that
the apparent peak in the FFT, such as shown in the 2006
paper, is or is not removed when any explanatory effects
are removed, perhaps leaving this to be judged on an
informal basis.
Even if this can be done, you could still be left with the
problem that you are looking for an effect whose cause is
indistinguishable from the effects of other causes, as
previously identified in other literature.
David Jones:
The Miller data are a time series in a way,
They are only a time-series because they have been
manipulated in the form of a time-series.
Not at all: each run, consisting of 20 consequtive turns
with perhaps a few "adjustment" turns, spans about 20
minutes and represents and single measurement of the aether-
drift. The manipulation comprises reinstating the
adjustments and unrolling the 20 turns of 16 observasions
into a sequence 320 observations.
You should not remove real structure in the form of groups
of data
No such structure was removed. The periodicity of individual
turns is preserved in the observatsion indices and time
markings. The data of a "run" is physically a time-series.
unless you can sure that doing so
(a) does not remove or mask effects you are looking
(b) does not introduce effects of the kind you are
looking for.
I am sure the serialisation in question does neither.
with the
readings as uniform as the rotation of the device. Would it
be possible to analyse it using the SigSpec algorithm:
https://en.wikipedia.org/wiki/SigSpec
using the eponymous program:
https://arxiv.org/pdf/1006.5081.pdf
There may be better/more-capable packages available from
time-series analysis specialists.
There may be, but SigSpec seems one of the very best, and
specifically designed to detect significant spectral
components in time series. It will no doubt find significant
high-magnitude and low-frequency components in the Miller
signal, but we are interested in whether full- and half-
period components are prominent above the others, and by how
much. "Multisine" analysis (like SigSpec) seems more
unbiased in this case than the standard Fourier, with its
fixed set of harmonics.
But you should be aware that any statistical tests would
depend on the validity of the usual assumptions which
would need to be given serious consideration, If you were
planning on doing something depending in a simple way on
FFTs you would need to consider that there is an inherent
assumption that the series being analysed is a good
representative of a stationary process (in terms of the
length of the series being analysed).
The signal sought is stationary within each run, the noise
is also stationary, whereas the instrumental drift is
probably not.
Loosely speaking, can you imagine in a general way how the
observed time-series would have behaved before and after
the period supposedly observed. But if we assume the
instrumental drift to be free of any periodicity in turn,
we may discrard spectral components whose frequencies are
not multiples of 1/turn.
I can imagine that about the hypothetical signal and noise,
but not about the instrumental drift. The assumption,
however, that it is of lower frequency than the signal, may
help to separate one from the other.
The "time-series" in the 2006 paper seems to show a
distinct change in behaviour part way through.
It does.
One might consider a logical way forward that doesn't
place heavy reliance on assumptions would be to show that
the apparent peak in the FFT, such as shown in the 2006
paper, is or is not removed when any explanatory effects
are removed, perhaps leaving this to be judged on an
informal basis.
I thought that the spectral-significance (SigSpec) measure
was made to answer such questions.
Even if this can be done, you could still be left with the
problem that you are looking for an effect whose cause is
indistinguishable from the effects of other causes, as
previously identified in other literature.
Yes, the instrumental error itself could be periodic, but
then it would be present with similar parameters in all
"runs", which is not the case. Mr. Roberts made the same
assumtion -- that the instrumental error is not periodic in
a turn.
Anton Shepelev wrote:
One might consider a logical way forward that doesn't
place heavy reliance on assumptions would be to show that
the apparent peak in the FFT, such as shown in the 2006
paper, is or is not removed when any explanatory effects
are removed, perhaps leaving this to be judged on an
informal basis.
I thought that the spectral-significance (SigSpec) measure
was made to answer such questions.
You will need to get someone competent to check all the assumptions
involved.
David Jones to Anton Shepelev:
The data of a "run" is physically a time-series.
unless you can sure that doing so
(a) does not remove or mask effects you are looking
(b) does not introduce effects of the kind you are
looking for.
I am sure the serialisation in question does neither.
The question will be: will anyone else be sure?
I think I see a common approach between you and Prof.
Roberts: "I think I see a problem, this is what I think
will solve the problem, this is what I have done,
therefore I have solved the problem"
I thought that the spectral-significance (SigSpec)
measure was made to answer such questions.
You will need to get someone competent to check all the
assumptions involved.
Let me expand on that. It seems that the "statistical
tests" are based on asymptotic properties/results that are
only valid if there is a stationary process to be
analysed. You agreed that the observed series looks non-
stationary. So the basic results cannot be used. However
the package might contain something to allow some version
to be applied.
You may be hoping that a spectral analysis package will
provide all your answers, but recall that results of the
FFT are just a sophisticated version of regression
analysis, and you may be better off looking to that for a
way to proceed.... provided that you don't apply the parts
of the theory of regression that are not valid here.
But it is not just "instrumental errors" that need to be
thought, it is all the possible explanations put forward
by people like Shankland.
David Jones to Anton Shepelev:
David Jones to Anton Shepelev:
The data of a "run" is physically a time-series.
unless you can sure that doing so
(a) does not remove or mask effects you are looking
(b) does not introduce effects of the kind you are
looking for.
I am sure the serialisation in question does neither.
The question will be: will anyone else be sure?
Indeed, but it is hard to prove the absense of either loss
you mention.
If you, or anybody else, think that the serialisation of the
turns of a run may introduce some distorition or make the
signal otherwise less noticeable, then please share you
specific concerns, that we may discuss whether they are
justified.
For my part, I can only repeat the each "run" represents
twenty or more consequent turns of the interferometer within
a space of 15-20 minutes. It contains 20*16+1=321
observations made over twenty "observation turns",
occasionally interrupted by "adjustment turns", during which
no observations were recorded. The data, therefore, is a
physical time series with gaps. You can view them in this
form in the seq_t directory in this archive:
http://freeshell.de/~antonius/file_host/RobertsMillerData.7z
Since the signal we seek is half-periodic in a turn,
adjustment turns do not disrupt it (in any way that I can
think of).
I think I see a common approach between you and Prof.
Roberts: "I think I see a problem, this is what I think
will solve the problem, this is what I have done,
therefore I have solved the problem"
Please note, that I initiated discussion of the statistical
analysis of the Miller experiemnts in this group,
specifically because I needed your help and advice as expert
statisticians. Mr. Roberts, on the other hand, professes no
such desire...
I thought that the spectral-significance (SigSpec)
measure was made to answer such questions.
You will need to get someone competent to check all the
assumptions involved.
Can you help me first to identify those assumtptions? That
the signal saught is stationary and periodic in a half-turn
is a fact. Noise is not periodic. The instrumental dirft may
be assumed to be aperiodic from looking at the measurements,
but a specific phycial or statistical justification is
welcome. The key point is to determine whether it may pose
as signal or not.
Let me expand on that. It seems that the "statistical
tests" are based on asymptotic properties/results that are
only valid if there is a stationary process to be
analysed. You agreed that the observed series looks non-
stationary. So the basic results cannot be used. However
the package might contain something to allow some version
to be applied.
How does one determine whether the instrumental drift is a
stationary process? What do you think can make that process
non-stationary? The dominance of the basic linear drift
during the entire run seems to indicate that it is
statuionary within the period of the run. After consulting
the definitiona of a stationary process, I retract my
previous statement to the contrary.
The SigSpec program performs a multisine analysys of a time
series, finding its most significant spectral components (in
no way limited to multiples of a fundamental frequency),
their respective significance, and the residual data. This
should work as well if the singal is stationary and the
error is not.
You may be hoping that a spectral analysis package will
provide all your answers, but recall that results of the
FFT are just a sophisticated version of regression
analysis, and you may be better off looking to that for a
way to proceed.... provided that you don't apply the parts
of the theory of regression that are not valid here.
With FFT, we know our basis beforehand. With multisine, we
do not, which makes it less "prejudiced" to what is sought.
If a significant half-period component appear in multisine,
it will indicate much more than such a component in the FFT,
where it is mathematicaly bound to appear, as Mr. Roberts
correctly observes. Thank you for the advice about
regression. I will think how I can apply it to the data in a
way different from that of Mr. Roberts. Basically,j
Obviously we can't hope to deal with the whole of
statistical theory here. But we can look, in some simple
cases, at the effects of dealing or not dealing with pre-
analysis data-manipulations within the data analysis.
Even the most basic statistics work relates to dealing
with within-analysis manipulations.
For example the usual formula for the estimated variance
contains the divisor (n-1) instead of the divisor n, and
this can be considered to be an adjustment to take account
of the fact that you subtract-off the sample mean within
the analysis.
Similarly, in regression, the sum-of-squares is divided by
(n-p) to take account of fitting a total of p parameters.
In both cases the adjustment is made to get an unbiased
estimate of the variance.
So, let's consider some pre-analysis data manipulations.
Let's assume you have two pairs of observations (X1,X2)
and (Y1,Y2), with statistical independence within and
between pairs. Let the theoretical mean of each
observation in the first pair be M1, and let the
theoretical mean of each observation in the second pair be
M2. Suppose it is assumed the theoretical variance for
each of the four observations is the same, and consider
two cases where this is either known to be 1, or else it
needs to be estimated. Then consider four versions of
analyses with different pre-analysis manipulations as
follows.
(a) Separate analysis. Here the data being analysed
consists of the two pairs (X1,X2) and (Y1,Y2). Then the
sample-means with each pair, provide unbiased estimates of
the two values M1 and M2, and the theoretical variance of
each estimate is 1/2 if the variance of the observations
is assumed known at 1. If the variance of observations is
unknown, one could get and use two different estimates of
that variance from the sample variance applied within each
pair. Each such estimate would be unbiased.
(b) Separate analysis, but pooled. This is the same as for
(a), above, except that the variance of the observations
is estimated by the average the sampling variances from
the two pairs. The theoretical variances of the means
remain the same as in (a), but one gets better estimates
of those variances. This is achieved by making use of an
assumed structure across the pairs (that the variances are
the same).
(c) Subtraction of means. To yield a special case of what
might be done for longer series, suppose that a single
dataset of 4 values (Q1,Q2,Q3,Q4) is constructed from the
two pairs by subtracting the two sample means, giving
Q1=(X1-X2)/2, Q2=(X2-X1)/2, Q3=(Y1-Y2)/2, Q4=(Y2-Y1)/2
Obviously doing this prevents any estimation of the means
M1 and M2. Applying the usual formula to get a sample
variance from (Q1,Q2,Q3,Q4) gives an estimate that has a
mean value of 2/3 when the true observation variance is
known to be 1.
To get a good (unbiased) estimate you have to know the
structure of the pre-analysis data manipulation that
yielded the data-to-be-analysed (Q1,Q2,Q3,Q4).
In fact this turns out to be the pooled sample variance
from the original pairs as in (b).
Thus, not all is necessarily lost in doing pre-analysis
data-manipulations, provided that the actual analysis
takes account of those manipulations.
(d) Joining of data. To emulate the data-joining of the
paper and of your proposed analysis, we can consider
dealing with a revised dataset (Z1,Z2,Z3), where
Z1=X1, Z2=X2, Z3=Y2+X2-Y1
Then the mean of each value is M1, and it clear that M1
can be estimated but not M2. One might use the sample mean
of (Z1,Z2,Z3) to estimate M1: this estimate has a
theoretical variance of 7/9. Thus this estimate is worse
than the sample mean of just (Z1,Z2), which is the same as
the sample mean of (X1,X2), whose variance is 1/2. The
usual sample variance obtained from (Z1,Z2,Z3) has an
expected value of 5/3 when the theoretical observation
variance is 1. If the usual sample variance obtained from
(Z1,Z2,Z3) is used to estimate the variance of the sample
mean of (Z1,Z2,Z3), this would have an expected value of
5/9 rather than the true variance of this sample mean
which is 7/9. So here, if one ignores the way in which
(Z1,Z2,Z3) were obtained and just uses the usual sample
estimates, we get an estimate for M1 which is worse (in
terms of variance) than what might have been obtained by
just using one the one sample pair (X1,X2). Moreover the
usual formula would give estimated variances which are
biased in either case of trying to estimate the
observation variance or the variance of the sample mean.
One might consider other estimates here, derived from
(Z1,Z2,Z3), but whether or not one looked for optimal
estimates this would involve taking into account the
structure by which the dataset was created. To summarise,
poor performance will arise from any attempt to analyse
the constructed dataset without taking into account the
details of how it was constructed.
In this example, the data-manipulation throws away any
ability to estimate an important property (M2) of one part
the original dataset whereas retaining all the original
data and the structure therein allows everything to be
estimated.
So my conclusion is that you should not try to merge
groups of data into one supposedly-continuous time-series
as you don't have to do so.
It is possible to do a combined analysis of all groups
within joining them.
Since there is just one pre-specified frequency there is
no need to do a spectral analysis.
But, if you really wanted to do a spectral analysis
combining all groups without joining them together, this
is certainly possible ... you just have to understand the
meaning of the quantities produced in the analysis of a
single series.
Thank you for the answer, David Jones:
But, if you really wanted to do a spectral analysis
combining all groups without joining them together, this
is certainly possible ... you just have to understand the
meaning of the quantities produced in the analysis of a
single series.
So, you propose to amend my analysis by performing 20
separate spectral analyses?
____________________
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 172:25:36 |
Calls: | 6,737 |
Calls today: | 2 |
Files: | 12,264 |
Messages: | 5,364,855 |