DirectX 12 is Coming to Hitman 2 - Will this address the game's CPU woes?

Well so far, every Dx12 game I play suffers with terribly annoying microstutter and random FPS drops.

Division 2 being the worst of all. Not sure it will help the CPU troubles at all.
 
DX12 it really hardware dependant on who it helps, often it's more beneficial to APUs or Piledriver/oldGCN kinda setups than to modern high spec PCs I find, besides the performance issues with a couple of features on(Moreso older) NVidia cards and stuff.
Technically until Turing Async Compute still had a measurable latency penalty and could cause stuttering when the card was switching from graphics to compute workloads which AMD optimised games would do as required quite often, so it's not really recommended with Pascal or Maxwell even though technically the feature is there.
 
Last edited:
AMD optimized games have nothing to do with Asynchronous Compute. Nothing to do with Piledriver or any CPU.

Asynchronous Compute is just part of the DX12 feature set. It's goal is to REDUCE latency from context switching. It does introduce latency, but it's less latency than context switching. It's a benefit overall.
AMD doesn't suffer from this because it has dedicated cache to store the task during this switch, which allows the new tasks to get completed then the previous tasks gets sent back from the cache. It's faster this way.
Nvidia has no dedicated cache. It gets stored into the VRAM. A slow process. This is why they have performance losses. Turing solves this by creating a cache for the context switching.

Asynchronous Compute is great for consoles. Not much for PC. It only gives anywhere from 5-10% performance uplift. On PC we are generally not constrained by max performance. On consoles with limited power it has far more potential to maintain steady framerates for example.
 
Actually there's quite a few reasons impacting NVidia's pre-Turing asynchronous compute performance, the cache issue you're talking about is specifically for asynchronous integer and floating point pipelines. Later Maxwell had basic async compute support because it supported a graphics queue alongside compute queues, but it didn't have dynamic resource allocation which was a major hit to attempting to use DX12 Asynchronous compute capabilities. Pascal fixed this and added dynamic load balancing between all the queues, but the scheduler was still pretty weak and couldn't handle individual thread scheduling and used a per-warp scheduling system rather than per-thread and of course didn't allow concurrent execution of integer and floating point pipelines while as you mentioned required the results of memory address calculations to be stored in a different cache to the one used for addressing which required a transfer, which incurred further penalty, while Volta's scheduler and execution fixed many of these problems. While AMD already had multiple dedicated Asynchronous Compute Engine blocks on their cards from GCN's start which were essentially graphics command controllers for compute pipelines and already did per thread scheduling as if it were a CPU in a way, so most of these issues never really existed, they only added concurrent FP+INT execution with Vega or PS4Pro.

Sometimes concurrent FP+INT calculations is referred to as asynchronous compute but this is a very different thing to the DX12 concept of asynchronous compute and when used in games INT+FP concurrency is referred to as Rapid Packed Math by AMD, though the former does impact performance of some implementations of the latter.

Games using DX12 should have separate pipelines and code paths for different architectures to properly make use of the API, in fact some in the industry if you don't have the resources for that then a developer should just stick with DX11, if a game runs badly on an architecture it's possible the developers just didn't bother to implement a well optimised path for it or it's going to a fallback layer, or there's only one codepath and that's a modified version of an existing one from the consoles ones and is best suited to similar architectures.

There's no benefit to removing abstraction if you're not going to make use of the extra ability to fine tune to the hardware basically, and if you remove that abstraction and then try to run code create for a different arch then it's obviously going to run into issues, the key technological benefit of abstraction is hardware agnosticism which is one reason(Besides the abstraction itself) why DX11 is so much easier to work with, a single piece of code in theory should work mostly the same way across different architectures.
 
Last edited:
Asynchronous compute allows you to run jobs on a compute pipe that can run parallel to the graphics pipes sharing a few compute units, it's also not exclusive to DX12.
 
Asynchronous compute allows you to run jobs on a compute pipe that can run parallel to the graphics pipes sharing a few compute units, it's also not exclusive to DX12.

Exactly. Although I didn't know it was not exclusive. What else supports this? Or would it be up to a game engine devs just implementing this?
 
Yeah that is the functionality from an abstract software perspective but the underlying hardware implementations (And the implications the downsides of these implementations have on attempting to apply this concept) varies significantly across Nvidia hardware generations since Maxwell 2 first added "support" by simply allowing multiple queues/pipes without any management of them and this also varies notably from AMDs ACE based implementation.

Vulkan and of course Mantle has support for this too as its origins are mostly from exposing GCN's ACE's functionality, referring to it in the DX12 sense was just to distinguish it from concurrent int+fp pipelines which some people also call async compute, as it is a feature that wasn't really accessible for widespread use before then.
 
Last edited:
Back
Top