Does anyone know what the "denormals are zeros" flag of the
x86 MXCSR is good for?
Or more precisely: I know what it does, but I don't know why
it should make sense to consider denormal values as zeros.
On Fri, 17 Jun 2016 18:11:28 +0200, Bonita Montero
<Bonita.Montero@gmail.com> wrote:
Does anyone know what the "denormals are zeros" flag of the
x86 MXCSR is good for?
Or more precisely: I know what it does, but I don't know why
it should make sense to consider denormal values as zeros.
Mainly performance - denormals tend to be slow (although less so on
recent x86s). Some codes do things like converge to zero, but end up
passing through the denormal range first - just skipping that can
sometimes be a considerable performance improvement. There are some
downsize to disabling gradual underflow, but in practice many cases
where you get them you're on your way to zero anyway, and in most
cases the advantages of gradual underflow are very small.
I printed the sums only to prevent the compiler from optimizing away
the summation. The result is that on my Xeon E3-1240 (Skylake) each
iteratoin takes four clock-cycles when "d" is non-denormal. When "d"
is a denormal, each iteration takes about 150 clock cycles! I'd never
believe denormals would have such a huge performance-impact if I
wouldn't have seen the opposite.
And what about GPUs? I suppose they don't support denormals.
Is this right?
Mainly performance - denormals tend to be slow (although less so on
recent x86s). ...
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 239:39:41 |
Calls: | 6,624 |
Files: | 12,173 |
Messages: | 5,320,014 |