20160905 - GPU Parking Lot
Push Model
Perhaps the GPU parking lot, aka register file waiting on long latency returns,
is a side effect of not having ability to issue a load which pushes data to a different SIMD unit's register file?
If loads could be issued and return somewhere else, one could possibly split a problem into 2 components:
the part figuring out how to route memory traffic, and the part consuming the memory traffic.
No call and return, thus no parking of state after loads.