Unfortunately, the code is 1 byte longer (for each controller access) as ldl 20 requires a prefix.
I don't understand why icc did not generate the more direct variant:
ldl 2 -- val
ldc 16
stnl 2
To get rid of the pfix I changed the code again to:
#define CONTROLLER(r) (((unsigned int volatile *)(0x00))[r+8])
icc 3.01.41 is not able to generate code for T2 but optimizes test3 to the same code as test2.
Generating suboptimal code is frustrating but generating wrong code is unforgivable.
As this will be unfixed for the future, keep always an eye on the generated assembler code if things don't work as expected.