Hi All,middle sales tier would have the highest count of locations.
I'm trying to figure out a formula or method to organize a list of numbers into a normal distribution shaped like a bell curve.
I'm working Excel...
Currently I have a list of text representing store locations in column A and sales figures in column C. In column B I'm trying to generate a sales tier or bin. I want to organize the bins into a normal data distribution shaped like a bell curve. So the
I've come up with a few statistical methods which generate the (1) size of each bin, and (2) the number of bins, but I can't seem to get it quite into a normal distribution.
Is there anyway to write a formula to generate the number of bins and bin size to organize the data into a normal distribution? Any help would be appreciated.
Thank you!
Hi All,middle sales tier would have the highest count of locations.
I'm trying to figure out a formula or method to organize a list of numbers into a normal distribution shaped like a bell curve.
I'm working Excel...
Currently I have a list of text representing store locations in column A and sales figures in column C. In column B I'm trying to generate a sales tier or bin. I want to organize the bins into a normal data distribution shaped like a bell curve. So the
I've come up with a few statistical methods which generate the (1) size of each bin, and (2) the number of bins, but I can't seem to get it quite into a normal distribution.
Is there anyway to write a formula to generate the number of bins and bin size to organize the data into a normal distribution? Any help would be appreciated.
Thank you!
One should never use a transformation without having a good reason
not depending on the observed distribution to do it. Transforming to
a normal distribution makes ALL subsequent analyses suspect.
On Mon, 29 Feb 2016 13:07:04 -0800 (PST), marklukawsky@gmail.com
wrote:
Hi All,
I'm trying to figure out a formula or method to organize a list of numbers into a normal distribution shaped like a bell curve.
I'm working Excel...
A and sales figures in column C. In column B I'm trying to generateCurrently I have a list of text representing store locations in column
size of each bin, and (2) the number of bins, but I can't seem to getI've come up with a few statistical methods which generate the (1)
bin size to organize the data into a normal distribution? Any help wouldIs there anyway to write a formula to generate the number of bins and
Thank you!
My! Arbitrary widths? That sounds like someone is trying
for an entry in some new edition of "How to Lie with Statistics".
Don't do it.
I Googled, and I have just glanced at this: Wikipedia has a nice
article under "Histograms". It has a good discussion of algorithms
for "number of bins".
However, I don't see that it mentions what to do with data that
cover a huge range and deserve to be transformed .... Maybe
I should add that, or suggest it on the discussion page ....
If you have a wide range, it could be natural to use transformation.
Keep it a simple one, and one that is fairly natural for whatever is measured. I would have to say that the first choice is always "logs".
Second, consider inverting the measure, such as Miles-per-gallon
becoming Gallons-per-100 miles. - I once saw a very nice data
presentation concerning the records at various distances for
track meets, which achieved fine unification by converting all
"times" for the records into "speed". They plotted 100 years of world-records, all distances, for males and females.
For random counts of events, it could be that the square-root
will be the "natural" transformation. What would inhibit me from
using it is that it is seldom used by others. However, it is another
one to consider.
On Tue, 1 Mar 2016 21:07:00 -0000 (UTC), Herman Rubin
<hrubin@skew.stat.purdue.edu> wrote:
One should never use a transformation without having a good reason
not depending on the observed distribution to do it. Transforming to
a normal distribution makes ALL subsequent analyses suspect.
I would ask that the second sentence be started with some
qualifier like "Arbitrarily..."
Trying to analyze a distribution with tests that have normality
assumptions, when you can be sure that the metric in use
has great heterogeneity of variance, is a way to assure
that your statistical tests are wrong.
Thanks Herman, but I think we are getting slightly off topic here :).
I have a list of locations and their related sales volumes.
Assuming the data is a normal distribution then all I'm trying to do is determine a formula that:
(1) gives me the optimal number of bins and
(2) bin ranges which create a normal distribution
so that the locations fall into the bin ranges in a normal distribution pattern. i.e. the highest count of locations will be in the middle
tier.
Thanks for the interesting commentary though!
What are your thoughts on if I use percentiles to set my ranges? So I group locations and their sales into groups based on percentiles from the bell curve. Example:
Tier A Upper Bound: Top 100% or Max of Data Set
Tier B Upper Bound: Top 97.5%
Tier C Upper Bound: Top 84%
Tier D Upper Bound: Top 50%
Tier E Upper Bound: Bottom 16%
Tier F Upper Bound: Bottom 2.5%
That way it forces it into a normal distribution.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 213:36:13 |
Calls: | 6,619 |
Calls today: | 1 |
Files: | 12,168 |
Messages: | 5,317,428 |