20160905 - GPU Parking Lot

Push Model
Perhaps the GPU parking lot, aka register file waiting on long latency returns, is a side effect of not having ability to issue a load which pushes data to a different SIMD unit's register file? If loads could be issued and return somewhere else, one could possibly split a problem into 2 components: the part figuring out how to route memory traffic, and the part consuming the memory traffic. No call and return, thus no parking of state after loads.