FSR, XeSS, and the Latency Problem

Cross-vendor reconstruction, and why a 240 FPS game can feel like 60.

This chapter has two halves. First the cross-vendor alternatives to DLSS AMD's FSR family and Intel's XeSS because their design constraints are instructive. Then the latency problem: why a game running at "240 FPS" with frame generation does not respond to your input like a 240 FPS game would, and what Reflex-class technologies do about it.

FSR: the open alternative

AMD's FidelityFX Super Resolution is open-source and cross-vendor it runs on NVIDIA, Intel, and console GPUs. It has gone through four major revisions:

FSR 1 (2021) spatial only. Discussed in chapter 6. Mostly obsolete.
FSR 2 (2022) temporal, hand-tuned (no neural network), broadly comparable to DLSS 2 in motion-stable shots but worse in motion. Runs on essentially any modern GPU.
FSR 3 (2023) adds frame generation as a separate, hand-tuned (not neural) algorithm built on optical flow + motion vectors, running on shader cores. Image quality of the generated frames is below DLSS 3.
FSR 4 (2025) finally a neural network, trained for RDNA 4's new ML units. Image quality leaps closer to DLSS in independent comparison. AMD has committed to making FSR 4 work on older AMD GPUs eventually, with reduced features.

The interesting tension here: FSR 1–3 was deliberately not a neural network because AMD chose to ship a technique that works on every GPU. The cost of that decision was image quality. FSR 4 abandoned that constraint and immediately became substantially better at the price of being effectively RDNA-4-tier-only.

A horizontal timeline from 2021 to 2025 showing the four FSR generations as labeled blocks: FSR 1 (spatial, single-frame, blue), FSR 2 (temporal, green), FSR 3 (temporal plus frame generation, orange), and FSR 4 (neural, red). Each block lists its key technical property. Caption reads 'AMD's path from spatial → temporal → neural'. Clean technical infographic. — FSR's path from spatial filter (2021) to full neural upscaler (2025) the same arc DLSS took years earlier.

Why FSR 2/3 looks worse despite the same idea

DLSS and FSR 2/3 have nearly identical architectural shape both are temporal upscalers consuming jittered color, motion vectors, depth, and a history buffer. The difference is what happens inside the box.

DLSS uses a neural network with millions of parameters trained on millions of examples. FSR 2/3 uses a small number of hand-tuned operations: color clamping in a custom color space, depth-based history validation, reactive masks for transparency, and a custom upsampling kernel. Both can produce excellent stills; in motion, the neural network handles edge cases the hand-tuned filter misses, especially:

Thin-feature handling (wires, hair, antennae). DLSS holds them better.
Disocclusion fill. DLSS regenerates faster; FSR shows fizzle for longer.
Reactive surfaces (water, specular). DLSS is less prone to oily smearing.

FSR 4 closes most of this gap by, predictably, replacing the hand-tuned heuristics with a learned function the exact same transition DLSS made between 1 and 2.

XeSS: Intel's two-tier design

Intel's XeSS is a temporal neural upscaler in the DLSS family. What makes it distinctive: Intel ships it in two versions from the same codebase.

XeSS XMX path runs on Intel Arc GPUs' XMX matrix units, full network, full quality. Comparable to DLSS 2 / 3.
XeSS DP4a path same network, compiled to use the DP4a (8-bit integer dot product) instruction supported on essentially every modern GPU. Lower quality (smaller network, less precision), but cross-vendor.

This is engineering pragmatism. By shipping the same network in two precisions/sizes, Intel can claim "best quality" on its own hardware while still being a real cross-vendor option. AMD has, with FSR 4, moved closer to this same model.

A branching diagram showing the same XeSS network source code compiled into two outputs: top branch labeled 'XMX path full precision, Intel Arc', bottom branch labeled 'DP4a path 8-bit, any GPU'. Both feed into a 'Quality output' with the top branch being noticeably sharper. Clean technical infographic. — One XeSS network, two compile targets full quality on Intel XMX, lower-quality DP4a fallback on anyone else.

The cross-vendor lesson

The trajectory is consistent across all three vendors:

Hand-tuned heuristics → small neural network on general hardware → bigger neural network on specialised hardware.

The economics push everyone in the same direction. Specialised matrix-math hardware is cheap to add to a GPU because game-rendering already wants it for ray-tracing denoisers, lighting probes, and asset compression. Once you have it, you may as well use it for upscaling. Once you use it for upscaling, you tend to win on quality.

This is why every new GPU architecture in 2025–2026 has dedicated ML acceleration: NVIDIA Tensor Cores, AMD AI Accelerators, Intel XMX, Apple Neural Engine, Sony's PS5 Pro custom block, and Microsoft's "AI capability" for the next Xbox. Reconstruction is the killer app that justifies all of them.

Part two: the latency problem

Now to the part that ships with every frame generation marketing slide as fine print.

What latency means here

When you press a key, several things happen before the resulting frame reaches your eye:

Input poll the game samples the keyboard/mouse/controller.
CPU simulation physics, animation, AI, gameplay logic update the world state.
CPU render command buffer the engine builds the GPU draw list for the new state.
GPU render the GPU executes the draw list.
Display scan-out the finished frame is sent to the monitor.

The end-to-end latency is the sum of all this typically 30–60 ms in a normal game pipeline, low enough to feel responsive. Frame generation adds to this number because of how it has to be sequenced.

Why FG adds latency

Recall from the previous chapter: to produce generated frame N.5, you need both rendered frame N and rendered frame N+1. Which means rendered frame N+1 must already be available before you can show N.5. Which means rendered frame N is held back by the engine until N+1 is ready otherwise it would be displayed before N.5, breaking the pacing.

Result: every rendered frame is delayed by one render interval before display. At 60 rendered FPS that's 16.6 ms of added input-to-photon latency. The displayed frame rate is 120, but the responsiveness feels like 30 FPS slower than if you just played at 60 with no FG.

A timeline diagram showing input arriving at time T. Without FG, the input is rendered immediately and displayed at T+16ms. With FG, input arrives at T, the renderer is currently producing frame N+1 needed for FG, so the result is held until N+1 finishes, frame N.5 is computed, then displayed total T+33ms. Annotated to show the added latency. Clean technical infographic. — FG must hold rendered frame N until N+1 is ready, adding one render-interval of latency to every displayed frame.

NVIDIA Reflex (and equivalents)

Reflex is NVIDIA's solution. It is not a single technique but a bundle of latency-reduction mechanisms:

Reduced render queue. Normally the CPU runs 1–3 frames ahead of the GPU to keep the GPU busy. Reflex pegs this at 0–1 frames. Throughput drops slightly, latency drops a lot.
GPU-paced CPU. The CPU starts simulating a new frame just in time for the GPU to start rendering, not earlier. The simulated state therefore reflects the most recent input.
Driver-level coordination. The driver communicates frame timing back to the engine so the simulation tick aligns with display scan-out, not arbitrary clock time.

Reflex by itself (in a non-FG game) reduces end-to-end latency by 10–40 ms. With FG enabled, Reflex is mandatory NVIDIA's runtime refuses to enable FG without Reflex on because it cancels out most of the latency penalty FG adds.

AMD has an equivalent called Anti-Lag 2 (formerly Anti-Lag+, integrated into the engine rather than the driver, after the original driver-level Anti-Lag had compatibility problems). Sony's PS5 platform has similar in-OS scheduling controls. Intel has a less developed equivalent.

The honest accounting

With Reflex and frame generation together:

Setting	Render FPS	Display FPS	Roughly equivalent input latency
Native, no Reflex	60	60	60 FPS
Native + Reflex	60	60	~80 FPS feel
FG + Reflex	60	120	~50 FPS feel
FG + Reflex (well above CPU bottleneck)	120	240	~100 FPS feel

The pattern: FG always raises display rate, but only maintains input latency relative to non-FG; it never improves it. Reflex separately reduces latency, and the combination is roughly latency-neutral compared to a Reflex-only non-FG configuration. So FG's honest benefit is motion smoothness, not responsiveness.

When to use FG, when not to

A reasonable rule:

Single-player, visually rich games: FG is a clear win. Cyberpunk, Alan Wake, Hogwarts Legacy all feel better at 120 displayed FPS even with the slight latency penalty.
Competitive multiplayer: FG is rarely worth it. Counter-Strike, Valorant, Apex players care about latency, not motion clarity.
Below ~40 rendered FPS: don't. The interpolation artifacts become visible and the latency added is intolerable.

A simple decision chart with three sequential questions. Question 1: 'Is the game competitive multiplayer?' if yes, don't use FG. Question 2: 'Is your rendered FPS above 60?' if no, skip FG. Question 3: 'Do you have a 120Hz+ display?' if no, FG offers no benefit. Otherwise, use FG. Clean infographic, dark background. — When to turn on FG and when the input-latency cost makes it actively worse than the lower refresh rate.

In the next chapter we'll look at the artifacts themselves what causes ghosting, smearing, disocclusion fizzle, and "shimmer" at the algorithmic level because that vocabulary lets you read any DF-style image comparison correctly.