Motion Vectors: The Most Important Texture in Modern Games

Per-pixel arrows pointing into the past. Every temporal technique depends on them.

If TAA, DLSS, FSR, XeSS, and PSSR have one thing in common, it is that they would all completely fall apart without motion vectors. A motion vector is a per-pixel 2D vector that says: "this pixel was at position (x', y') in the previous frame." It is the bridge between today's frame and yesterday's frame, and the entire industry has spent ten years getting better at producing it.

Definition

For each pixel on screen with current coordinates (x, y), the motion vector MV(x, y) is a 2D offset such that:

previous_position = (x, y) + MV(x, y)

Some engines store it the other way around (current = previous + MV); the convention varies. The magnitude is in pixel units, or sometimes in normalized device coordinates (NDC) so it is resolution-independent.

A motion vector texture is usually stored as an R16G16_FLOAT or R16G16_SNORM two-channel texture at full screen resolution. Red is the X offset, green is the Y offset, scaled to fit the format.

A game-like scene with a moving car. Overlaid on top, a sparse grid of small arrows: long arrows pointing left where the car is moving relative to the camera, short arrows on the static background where parallax causes mild flow. Below, the same scene shown as a false-color flow visualization where red is X and green is Y, like an optical flow visualization. Two panels side by side. Clean technical diagram.
Each pixel carries a 2D vector pointing back to where it was in the previous frame.

Where motion vectors come from

There are two sources, and a real engine combines both.

1. Camera motion (analytical, per-pixel)

If you know the camera's view-projection matrix for this frame and last frame, and you know the pixel's world position (from depth), you can compute exactly where that pixel was in the previous frame. This is pure math, dead accurate, free.

vec4 worldPos = reconstructWorld(uv, depth, invViewProj);
vec4 prevClip = prevViewProj * worldPos;
vec2 prevUV   = prevClip.xy / prevClip.w * 0.5 + 0.5;
vec2 motion   = prevUV - uv;

This handles camera rotation, translation, and FOV change perfectly. It does not handle moving objects, because it assumes the world is static.

2. Object motion (rendered per-object)

For every moving object characters, vehicles, projectiles, swaying foliage the engine renders an additional pass into the motion vector buffer. For each vertex it computes the current clip-space position and the previous-frame clip-space position (using the previous bone matrices for skinned meshes), and the fragment shader writes the screen-space delta.

This combined motion vector buffer is one of the channels of the G-buffer we mentioned earlier. Producing it correctly for every moving thing in the scene is one of the most error-prone parts of an engine, and bad motion vectors are the cause of about 80% of upscaler artifacts.

A two-column diagram. Left column 'Camera Motion' shows a static scene with the camera moving, illustrating how parallax produces motion vectors via matrix math, with the matrix equation shown. Right column 'Object Motion' shows a static camera with a moving character, illustrating per-vertex previous-frame positions producing per-pixel motion. Both feed into a single combined motion vector texture shown at the bottom. Technical infographic.
Engine motion vectors combine analytical camera math with per-object screen-space deltas into a single texture.

What motion vectors do NOT capture

This is critical, and it is where every modern upscaler bleeds artifacts:

Phenomenon Captured by MVs? Why
Camera pan Yes Pure analytical math
Walking character Yes Per-bone object motion
Particles Sometimes Engine must explicitly write them
Shadows No A shadow is a result, not geometry
Reflections No The reflected object moves in mirror-space
Specular highlights No They slide across the surface as the camera moves
Transparency Partial Only the closest layer's motion is recorded
Caustics, refraction No Light moves, geometry doesn't
Procedural animation in shaders No The shader changes between frames, but the geometry doesn't move

So when you look at a TAA or DLSS output and see a reflection that ghosts, a shadow that smears, or a fire that leaves trails that is the upscaler failing to invalidate history because the motion vector said the pixel didn't move, but the appearance changed anyway.

This is why DLSS, FSR 2+, XeSS, and PSSR all rely on neural networks: the network learns to detect these "the motion vector lied" cases and reject history in a smarter way than a hand-tuned heuristic ever could.

Sub-pixel precision and resolution

Motion vectors are normally rendered at the internal (low) resolution of the renderer, not the output resolution. This is fine when the upscaler is sample-based it samples motion at the internal resolution and works there but it means motion-vector quantization can cause visible jitter for slowly moving objects, where the MV rounds to zero one frame and to one pixel the next.

Some engines now render motion vectors at higher resolution than color, or at full output resolution even when color is at internal resolution, just to give the upscaler clean motion data.

A close-up showing a slow-moving thin object such as a wire or pole. Top: motion vectors rendered at 1080p, quantized to whole-pixel arrows, alternating between 0 and 1 pixel. Bottom: motion vectors rendered at 4K, smooth and fractional. The 1080p version causes visible jitter in the reconstructed output, shown as a stuttering trail. Clean technical diagram.
Low-res motion vectors quantize to whole pixels, causing visible jitter on slow-moving thin objects.

Reverse reprojection vs forward reprojection

Two camps:

  • Reverse reprojection (what TAA and DLSS Super Resolution do): for each output pixel, look up where it came from in the previous frame. Easy, deterministic, never produces holes.
  • Forward reprojection (what some frame-generation techniques do): for each input pixel, scatter it to where it will be in the next frame. Can produce holes and overlap, but is essential when you need to predict the future, not reproject the past.

DLSS 3 Frame Generation uses both: it has the motion vectors for the past (the rendered "real" frame N and "real" frame N+1), but it must forward-warp halfway between them to produce the generated frame N.5. We will look at exactly how in chapter 9.

A pure-vision alternative: optical flow

What if you don't have motion vectors at all? What if you only have two images and need to figure out the motion between them? That problem is optical flow a classical computer vision technique going back to the Horn–Schunck (1981) and Lucas–Kanade (1981) algorithms. Modern hardware optical flow accelerators (like the Optical Flow Accelerator on NVIDIA Ada/Blackwell GPUs) compute per-pixel motion between two arbitrary images in under a millisecond.

This is the second pillar of DLSS 3 Frame Generation: engine motion vectors plus hardware optical flow, fused together, give the network a better motion field than either source alone could.

A side-by-side comparison. Left panel 'Engine motion vectors' is accurate for camera and rigid objects but blank or zero for shadow, reflection, and particles. Right panel 'Optical flow (vision-based)' captures everything that visually moves: shadow movement, specular sliding, reflection. Below, a fusion arrow combines both into a single dense, complete motion field. Clean technical diagram, dark background.
Engine motion vectors plus hardware optical flow each fills the gaps the other misses.

In the next chapter we will look at upscaling itself: spatial vs temporal, why spatial upscaling alone is doomed, and what neural reconstruction actually adds on top of TAA upsampling.