20150624 - AMD Fury X (aka Fiji) is a Beast of a GPU Compute Platform
Compute Raw Specs
Raw specs from wikipedia adjusted to per millisecond: comparing what the two vendors built around 600 mm^2 on 28 nm,
AMD FURY X: 8.6 GFlop/ms, 0.5 GB/ms, 0.27 GTex/ms
NV TITAN X: 6.1 GFlop/ms, 0.3 GB/ms, 0.19 GTex/ms
Or the same numbers in operations per pixel at 1920x1080 at 60 Hz,
AMD FURY X: 69 KFlop/pix, 4.0 KB/pix, 2.2 KTex/pix
NV TITAN X: 49 KFlop/pix, 2.4 KB/pix, 1.5 KTex/pix
Think about what is possible with 69 thousand flops per pixel per frame.
HBM
HBM definitely represents the future of bandwidth scaling for GPUs:
a change which brings the memory clocks down and bus width up (512 bytes wide on Fury X vs 48 bytes wide on Titan X).
This will have side effects on ideal algorithm design: ideal access granularity gets larger.
Things like random access global atomics and random access 16-byte vector load/store operations
become much less interesting (bad idea before, worse idea now).
Working in LDS with shared atomics, staying in cache, etc, becomes more rewarding.