Where the GPU Spends Its Time

If you want to skip work, first you have to know where the work is.

Frame generation only makes sense if you understand which parts of a frame are expensive and which are cheap. The whole strategy of DLSS, FSR, XeSS, and PSSR is to skip the expensive parts and reconstruct what's missing. So this chapter is a detailed look at the cost structure of a modern frame.

A typical AAA frame breakdown

For a current-generation game like Cyberpunk 2077 with path tracing enabled, running at 4K on a high-end GPU, the per-frame timing roughly looks like this:

Pass	Approx. time	Notes
Shadow maps	1–2 ms	Re-render geometry from each light's view
G-buffer (geometry)	2–4 ms	Vertex + fragment work, fills depth/normal/albedo
Ray traced lighting	8–25 ms	The dominant cost in path-traced games
Denoising	2–4 ms	Cleans up noisy ray-traced output
Reflections / SSR / SSAO	1–3 ms	Screen-space approximations
Post-processing	1–2 ms	Tonemapping, bloom, motion blur, DOF
UI / HUD	<1 ms	Cheap

At native 4K with path tracing, the ray tracing and shading passes dominate everything else. That is why upscaling running those passes at, say, 1080p and reconstructing to 4K gives an enormous win. You are slashing the most expensive part of the frame by 4×.

A horizontal Gantt-style timeline showing a single 16.6 ms frame budget broken into colored segments: shadow maps in yellow, G-buffer in cyan, ray tracing in red and by far the largest, denoising in orange, reflections in purple, post-processing in green, UI in gray. Times labeled in milliseconds. Titled 'Where 16.6 ms goes 4K, path-traced'. Clean infographic, dark background. — Where 16.6 ms goes in a 4K path-traced frame ray tracing and shading dominate everything else.

The cost scales with pixel count

Almost every pass listed above scales linearly with the number of pixels. Double the resolution and you roughly double the cost and 4K is four times the pixels of 1080p. This is the central insight that upscaling exploits.

Going from 1080p to 4K means:

1920×1080 = 2,073,600 pixels
3840×2160 = 8,294,400 pixels
4× more pixels to shade, light, and trace rays for.

If your "Performance" DLSS mode internally renders at 1080p and outputs 4K, the engine is doing roughly a quarter of the per-pixel work of native 4K. The remaining cost reconstructing the missing pixels is performed by a small neural network that runs in a fraction of a millisecond on dedicated hardware.

A nested rectangle diagram showing pixel counts to scale: 720p smallest, then 1080p, 1440p, 4K, 8K largest, each labeled with its resolution and pixel count. Each rectangle has the cost shown as a multiplier of 1080p (1×, 1.78×, 4×, 16×). Clean technical infographic, dark theme, neon outlines. — 4K is 4× the pixels of 1080p and per-pixel shading cost scales right along with it.

Which passes scale with resolution, and which don't

This matters because frame generation is selective:

Per-pixel passes (G-buffer, shading, ray tracing, post-processing) scale with resolution. Upscaling helps here.
Per-vertex passes (vertex shaders, skinning, geometry processing) scale with triangle count, not pixel count. Upscaling does not help here.
Setup and draw call overhead scale with the number of draw calls. Upscaling does not help here either.
Memory bandwidth is shared by everyone and is often the real bottleneck on high-end GPUs.

This is why DLSS "Quality" mode (which internally renders at ~67% of output resolution) sometimes only gives you a ~20% performance gain in CPU-bound or geometry-bound games. The renderer wasn't actually limited by pixels.

Where ray tracing fits in

Hardware ray tracing made the pixel-cost problem dramatically worse. A single ray-traced "primary" hit can spawn 2–10 secondary rays for reflections, shadows, and global illumination. Each ray walks a BVH (bounding volume hierarchy) a tree of axis-aligned boxes looking for the first triangle it hits. Modern GPUs have a fixed-function RT core that accelerates BVH traversal and ray-triangle intersection.

Even with hardware acceleration, a path-traced game at 4K can fire hundreds of millions of rays per frame. The output is intensely noisy only one or two samples per pixel and must be cleaned up by a denoiser (often itself a small neural network or a hand-tuned spatio-temporal filter).

A schematic side view of a 3D scene (a room with a mirror and a window). From the camera, primary rays shoot into the scene. Each primary hit spawns shadow rays toward the light, reflection rays toward the mirror, and indirect bounce rays into the room. Each ray is drawn as a thin colored line: primary white, shadow yellow, reflection cyan, indirect magenta. Annotated. Clean technical diagram, dark background. — Every primary ray spawns secondary rays for shadows, reflections and indirect bounces and every ray costs.

The shading cost is the real target

Add it all up and the picture is consistent: the most expensive thing a modern GPU does is evaluate per-pixel shading equations, especially when those equations involve ray tracing. Geometry, shadows, and post-processing are smaller costs. CPU work animation, physics, AI, draw call submission is essentially independent of resolution.

So if you want a magic switch that doubles or triples your frame rate, the place to look is the per-pixel shading cost. Upscaling attacks that cost by rendering fewer pixels in the first place. Frame generation attacks it differently: by rendering fewer entire frames and reconstructing the gaps.

In the next chapter we'll look at the technique that made both possible temporal anti-aliasing and how it taught engines to mix information across time.