And all of this stuff only works efficiently when there is an embar- rassingly large amount of parallelism.
Quadibloc <quadibloc@servername.invalid> schrieb:
Above a certain performance level, _all_ cores are out-of-order.
That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.
Thomas Koenig wrote:
Quadibloc <quadibloc@servername.invalid> schrieb:
Above a certain performance level, _all_ cores are out-of-order.
That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.
In general, GPUs handle hazards by context switching to a different
WARP. In some weird sense--this is maximally OoO; in other sense it
it completely in order.
GPUs context switch between instructions, and any instruction which
is not 1-cycle* uses the context switch mechanism. (*) instructions
which are 1 cycle can still context switch to a different WARP every
cycle.
Also, all 32 (or 64) threads per WARP execute the same instruction.
And all of this stuff only works efficiently when there is an embar- rassingly large amount of parallelism.
MitchAlsup1 wrote:
Thomas Koenig wrote:
Quadibloc <quadibloc@servername.invalid> schrieb:
Above a certain performance level, _all_ cores are out-of-order.
That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.
In general, GPUs handle hazards by context switching to a different
WARP. In some weird sense--this is maximally OoO; in other sense it
it completely in order.
It's barrel switching.
A rose by any other name would smell as sweet.
GPUs context switch between instructions, and any instruction which
is not 1-cycle* uses the context switch mechanism. (*) instructions
which are 1 cycle can still context switch to a different WARP every
cycle.
Also, all 32 (or 64) threads per WARP execute the same instruction.
Ok, it's Illiac IV with barrel switching.
And all of this stuff only works efficiently when there is an embar-
rassingly large amount of parallelism.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 299 |
Nodes: | 16 (2 / 14) |
Uptime: | 33:11:13 |
Calls: | 6,682 |
Files: | 12,222 |
Messages: | 5,342,867 |