Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

Re: Graphics cards are different

From Stefan Monnier@21:1/5 to All on Wed Jan 24 18:38:46 2024

MitchAlsup1 [2024-01-24 23:18:56] wrote:

And all of this stuff only works efficiently when there is an embar- rassingly large amount of parallelism.

Not only that: this parallelism needs to be "static".
An important part of the scheduling needs to be done at compilation time.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to Thomas Koenig on Wed Jan 24 23:18:56 2024

Thomas Koenig wrote:

Quadibloc <quadibloc@servername.invalid> schrieb:

Above a certain performance level, _all_ cores are out-of-order.

That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.

In general, GPUs handle hazards by context switching to a different
WARP. In some weird sense--this is maximally OoO; in other sense it
it completely in order.

GPUs context switch between instructions, and any instruction which
is not 1-cycle* uses the context switch mechanism. (*) instructions
which are 1 cycle can still context switch to a different WARP every
cycle.

Also, all 32 (or 64) threads per WARP execute the same instruction.

And all of this stuff only works efficiently when there is an embar-
rassingly large amount of parallelism.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From EricP@21:1/5 to All on Wed Jan 24 23:42:55 2024

MitchAlsup1 wrote:

Thomas Koenig wrote:

Quadibloc <quadibloc@servername.invalid> schrieb:

Above a certain performance level, _all_ cores are out-of-order.

That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.

In general, GPUs handle hazards by context switching to a different
WARP. In some weird sense--this is maximally OoO; in other sense it
it completely in order.

It's barrel switching.
A rose by any other name would smell as sweet.

GPUs context switch between instructions, and any instruction which
is not 1-cycle* uses the context switch mechanism. (*) instructions
which are 1 cycle can still context switch to a different WARP every
cycle.

Also, all 32 (or 64) threads per WARP execute the same instruction.

Ok, it's Illiac IV with barrel switching.

And all of this stuff only works efficiently when there is an embar- rassingly large amount of parallelism.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup1@21:1/5 to EricP on Thu Jan 25 16:40:02 2024

EricP wrote:

MitchAlsup1 wrote:

Thomas Koenig wrote:

Quadibloc <quadibloc@servername.invalid> schrieb:

Above a certain performance level, _all_ cores are out-of-order.

That is true for general-purpose CPUs, but not for GPUs - these
are in-order. I think AMD and NVIDIA differ in their handling
of register hazards - AMD handles them, NVIDIA depends on the
compiler (well, whatever you want to call the piece of software
that translates the intermediate PTX into whatever the graphics
card itself understands) to do this.

In general, GPUs handle hazards by context switching to a different
WARP. In some weird sense--this is maximally OoO; in other sense it
it completely in order.

It's barrel switching.
A rose by any other name would smell as sweet.

GPUs context switch between instructions, and any instruction which
is not 1-cycle* uses the context switch mechanism. (*) instructions
which are 1 cycle can still context switch to a different WARP every
cycle.

Also, all 32 (or 64) threads per WARP execute the same instruction.

Ok, it's Illiac IV with barrel switching.

Or BSP with barrel switching.

And all of this stuff only works efficiently when there is an embar-
rassingly large amount of parallelism.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	299
Nodes:	16 (2 / 14)
Uptime:	33:11:13
Calls:	6,682
Files:	12,222
Messages:	5,342,867