Deep learning super sampling

Deep learning super sampling (DLSS) is a family of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are exclusive to its RTX line of graphics processors^[1], and available in select video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that contains the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings and/or frame rates for a given output resolution, depending on user preference.^[2]

As of June 2021, this technology is available exclusively on GeForce RTX 20 and GeForce RTX 30 series GPUs.

History[]

Nvidia advertised DLSS as a key feature of the GeForce RTX 20 series GPUs when they launched in September 2018.^[3] At that time, the results were limited to a few video games (namely Battlefield V^[4] and Metro Exodus) because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling.^[5]^[6]

In 2019, the video game Control shipped with ray tracing and an improved version of DLSS, which did not use the Tensor Cores.^[7]^[8]

In April 2020, Nvidia advertised and shipped an improved version of DLSS named DLSS 2.0 with driver version 445.75. DLSS 2.0 was available for a few existing games including Control and Wolfenstein: Youngblood, and would later be added to many newly released games and game engines such as Unreal Engine^[9] and Unity^[10]. This time Nvidia said that it used the Tensor Cores again, and that the AI did not need to be trained specifically on each game. ^[3]^[11] Despite sharing the DLSS branding, the two iterations of DLSS differ significantly and are not backwards-compatible.^[12]^[13]

Release history[]

Release	Release date	Highlights
1.0	February 2019	First version, using AI and specifically trained for certain games, including Battlefield V and Metro Exodus^[4]
2.0 (first iteration)	August 2019	First 2.0 version, also referenced as version 1.9, using an approximated AI of the in-progress version 2.0 running on the CUDA shader cores and specifically adapted for Control^[7]^[3]^[14]
2.0 (second iteration)	April 2020	Second 2.0 version, using Tensor Cores again and trained generically^[15]

Quality presets[]

**Standard DLSS Presets**^[16]
Quality preset^[a]	Scale factor^[b]	Render scale^[c]
Quality	1.50x	66.6%
Balanced	1.72x	58.0%
Performance	2.00x	50.0%
Ultra Performance_{since v2.1}	3.00x	33.3%

^ The algorithm does not necessarily need to be implemented using these presets; it is possible for the implementer to define custom input and output resolutions.
^ The linear scale factor used for upsampling the input resolution to the output resolution. For example, a scene rendered at 540p with a 2.00x scale factor would have an output resolution of 1080p.
^ The linear render scale, compared to the output resolution, that the technology uses to render scenes internally before upsampling. For example, a 1080p scene with a 50% render scale would have an internal resolution of 540p.

Algorithm[]

DLSS 1.0[]

The first iteration of DLSS is a predomenantly spatial image upscaler with two stages, both relying on convolutional auto-encoder neural networks^[17]. The first step is an image enhancement network which uses the current frame and motion vectors to perform edge ehancement, and spatial anti-aliasing. The second stage an image upscaling step which uses just the current frame to upscale the image to the desired output resolution. The neural networks are trained on a per-game basis by generating a "perfect frame" using traditional supersampling to 64 samples per pixel, as well as the motion vectors for each frame. The data collected must be as comprehensive as possible, including as many levels, times of day, graphical settings, resolutions etc as possible. This data is also augmented using common augmentations such as rotations, colour changes, and random noise to help generalise the test data. Training is performed on Nvidia's Saturn V supercomputer. ^[13]^[18]

This first iteration received a mixed response, with many critising the often soft appearance and artifacting in certain situations^[19]^[5]^[4]; likely a side effect of the heavy reliance on the neural networks to produce the output image which could not be trained to perform optimally in all scenarios and edge-cases.^[13]

Nvidia also demonstrated the ability for the auto-encoder networks to learn the ability to recreate depth-of-field and motion blur^[13], although this functionality has never been included in a publicly released product.^{[citation needed]}

DLSS 2.0[]

DLSS 2.0 works as follows:^[20]

The neural network is trained by Nvidia using "ideal" images of video games of ultra-high resolution on supercomputers and low resolution images of the same games. The result is stored on the video card driver. It is said that Nvidia uses DGX-1 servers to perform the training of the network.^[21]
The neural network stored on the driver compares the actual low resolution image with the reference and produces a full high resolution result. The inputs used by the trained neural network are the low resolution aliased images rendered by the game engine, and the low resolution motion vectors from the same images, also generated by the game engine. The motion vectors tell the network which direction objects in the scene are moving from frame to frame, in order to estimate what the next frame will look like.^[22]

Architecture[]

DLSS is only available on GeForce RTX 20 and GeForce RTX 30 series GPUs, using dedicated AI accelerators called Tensor Cores.^[22]^[23]

Tensor Cores are available since the Nvidia Volta GPU microarchitecture, which was first used on the Tesla V100 line of products.^[24] Their specificity is that each Tensor Core operates on 16 bits floating point 4 x 4 matrices, and seem to be designed to be used at the CUDA C++ level, even at the compiler level.^[25]

The Tensor Cores use CUDA Warp-Level Primitives on 32 parallel threads to take advantage of their parallel architecture.^[26] A Warp is a set of 32 threads which are configured to execute the same instruction.

Anti-Aliasing[]

DLSS requires and applies its own anti-aliasing method.

It operates on similar principles to TAA. Like TAA, it uses information from past frames to produce the current frame. Unlike TAA, DLSS does not sample every pixel in every frame. Instead, it samples different pixels in different frames and uses pixels sampled in past frames to fill in the unsampled pixels in the current frame. DLSS uses machine learning to combine samples in the current frame and past frames, and it can be thought of as an advanced and superior TAA implementation made possible by the available tensor cores.^[12]

References[]

^ "NVIDIA DLSS Technology for Incredible Performance". NVIDIA. Retrieved 2022-02-07.
^ "Nvidia RTX DLSS: Everything you need to know". Digital Trends. 2020-02-14. Retrieved 2020-04-05. Deep learning super sampling uses artificial intelligence and machine learning to produce an image that looks like a higher-resolution image, without the rendering overhead. Nvidia’s algorithm learns from tens of thousands of rendered sequences of images that were created using a supercomputer. That trains the algorithm to be able to produce similarly beautiful images, but without requiring the graphics card to work as hard to do it.
^ ^a ^b ^c "Nvidia DLSS in 2020: stunning results". techspot.com. 2020-02-26. Retrieved 2020-04-05.
^ ^a ^b ^c "Battlefield V DLSS Tested: Overpromised, Underdelivered". techspot.com. 2019-02-19. Retrieved 2020-04-06. Of course, this is to be expected. DLSS was never going to provide the same image quality as native 4K, while providing a 37% performance uplift. That would be black magic. But the quality difference comparing the two is almost laughable, in how far away DLSS is from the native presentation in these stressful areas.
^ ^a ^b "AMD Thinks NVIDIA DLSS is not Good Enough; Calls TAA & SMAA Better Alternatives". techquila.co.in. 2019-02-15. Retrieved 2020-04-06. Recently, two big titles received NVIDIA DLSS support, namely Metro Exodus and Battlefield V. Both these games come with NVIDIA’s DXR (DirectX Raytracing) implementation that at the moment is only supported by the GeForce RTX cards. DLSS makes these games playable at higher resolutions with much better frame rates, although there is a notable decrease in image sharpness. Now, AMD has taken a jab at DLSS, saying that traditional AA methods like SMAA and TAA "offer superior combinations of image quality and performance."
^ "Nvidia Very Quietly Made DLSS A Hell Of A Lot Better". Kotaku. 2020-02-22. Retrieved 2020-04-06. The benefit for most people is that, generally, DLSS comes with a sizeable FPS improvement. How much varies from game to game. In Metro Exodus, the FPS jump was barely there and certainly not worth the bizarre hit to image quality.
^ ^a ^b "Remedy's Control vs DLSS 2.0 – AI upscaling reaches the next level". Eurogamer. 2020-04-04. Retrieved 2020-04-05. Of course, this isn't the first DLSS implementation we've seen in Control. The game shipped with a decent enough rendition of the technology that didn't actually use the machine learning
^ "NVIDIA DLSS 2.0 Update Will Fix The GeForce RTX Cards' Big Mistake". techquila.co.in. 2020-03-24. Retrieved 2020-04-06. As promised, NVIDIA has updated the DLSS network in a new GeForce update that provides better, sharper image quality while still retaining higher framerates in raytraced games. While the feature wasn't used as well in its first iteration, NVIDIA is now confident that they have successfully fixed all the issues it had before
^ "NVIDIA DLSS Plugin and Reflex Now Available for Unreal Engine". NVIDIA Developer Blog. 2021-02-11. Retrieved 2022-02-07.
^ "NVIDIA DLSS Natively Supported in Unity 2021.2". NVIDIA Developer Blog. 2021-04-14. Retrieved 2022-02-07.
^ "HW News - Crysis Remastered Ray Tracing, NVIDIA DLSS 2, Ryzen 3100 Rumors". 2020-04-19. Retrieved 2020-04-19. The original DLSS required training the AI network for each new game. DLSS 2.0 trains using non-game-specific content, delivering a generalized network that works across games. This means faster game integrations, and ultimately more DLSS games.
^ ^a ^b Edward Liu, NVIDIA "DLSS 2.0 - Image Reconstruction for Real-time Rendering with Deep Learning"
^ ^a ^b ^c ^d "Truly Next-Gen: Adding Deep Learning to Games & Graphics (Presented by NVIDIA)". www.gdcvault.com. Retrieved 2022-02-07.
^ Edelsten, Andrew (30 August 2019). "NVIDIA DLSS: Control and Beyond". nividia.com. Retrieved 11 August 2020. we developed a new image processing algorithm that approximated our AI research model and fit within our performance budget. This image processing approach to DLSS is integrated into Control
^ "NVIDIA DLSS 2.0 Review with Control – Is This Magic?". techquila.co.in. 2020-04-05. Retrieved 2020-04-06.
^ "NVIDIA preparing Ultra Quality mode for DLSS, 2.2.9.0 version spotted". VideoCardz.com. Retrieved 2021-07-06.
^ "DLSS: What Does It Mean for Game Developers?". NVIDIA Developer Blog. 2018-09-19. Retrieved 2022-02-07.
^ "NVIDIA DLSS: Your Questions, Answered". Nvidia. 2019-02-15. Retrieved 2020-04-19. The DLSS team first extracts many aliased frames from the target game, and then for each one we generate a matching “perfect frame” using either super-sampling or accumulation rendering. These paired frames are fed to NVIDIA’s supercomputer. The supercomputer trains the DLSS model to recognize aliased inputs and generate high quality anti-aliased images that match the “perfect frame” as closely as possible. We then repeat the process, but this time we train the model to generate additional pixels rather than applying AA. This has the effect of increasing the resolution of the input. Combining both techniques enables the GPU to render the full monitor resolution at higher frame rates.
^ "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". www.nvidia.com. Retrieved 2022-02-07.
^ "NVIDIA's Deep Learning Super Sampling (DLSS) 2.0 Technology Is The Real Deal". Forbes. 2020-03-29. Retrieved 2020-04-07.
^ "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". Nvidia. 2020-03-23. Retrieved 2020-11-25.
^ ^a ^b "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". Nvidia. 2020-03-23. Retrieved 2020-04-07.
^ "NVIDIA TENSOR CORES". Nvidia. Retrieved 2020-04-07.
^ "On Tensors, Tensorflow, And Nvidia's Latest 'Tensor Cores'". tomshardware.com. 2017-04-11. Retrieved 2020-04-08.
^ "The NVIDIA Titan V Deep Learning Deep Dive: It's All About The Tensor Cores". AnandTech. 2018-07-03. Retrieved 2020-04-08.
^ "Using CUDA Warp-Level Primitives". Nvidia. 2018-01-15. Retrieved 2020-04-08. NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion

[preset-disclaimer-17] The algorithm does not necessarily need to be implemented using these presets; it is possible for the implementer to define custom input and output resolutions.

[scale-factor-18] The linear scale factor used for upsampling the input resolution to the output resolution. For example, a scene rendered at 540p with a 2.00x scale factor would have an output resolution of 1080p.

[render-scale-19] The linear render scale, compared to the output resolution, that the technology uses to render scenes internally before upsampling. For example, a 1080p scene with a 50% render scale would have an internal resolution of 540p.

[1] "NVIDIA DLSS Technology for Incredible Performance". NVIDIA. Retrieved 2022-02-07.

[2] "Nvidia RTX DLSS: Everything you need to know". Digital Trends. 2020-02-14. Retrieved 2020-04-05. Deep learning super sampling uses artificial intelligence and machine learning to produce an image that looks like a higher-resolution image, without the rendering overhead. Nvidia’s algorithm learns from tens of thousands of rendered sequences of images that were created using a supercomputer. That trains the algorithm to be able to produce similarly beautiful images, but without requiring the graphics card to work as hard to do it.

[techspot-3] "Nvidia DLSS in 2020: stunning results". techspot.com. 2020-02-26. Retrieved 2020-04-05.

[battlefieldv-4] "Battlefield V DLSS Tested: Overpromised, Underdelivered". techspot.com. 2019-02-19. Retrieved 2020-04-06. Of course, this is to be expected. DLSS was never going to provide the same image quality as native 4K, while providing a 37% performance uplift. That would be black magic. But the quality difference comparing the two is almost laughable, in how far away DLSS is from the native presentation in these stressful areas.

[:0-5] "AMD Thinks NVIDIA DLSS is not Good Enough; Calls TAA & SMAA Better Alternatives". techquila.co.in. 2019-02-15. Retrieved 2020-04-06. Recently, two big titles received NVIDIA DLSS support, namely Metro Exodus and Battlefield V. Both these games come with NVIDIA’s DXR (DirectX Raytracing) implementation that at the moment is only supported by the GeForce RTX cards. DLSS makes these games playable at higher resolutions with much better frame rates, although there is a notable decrease in image sharpness. Now, AMD has taken a jab at DLSS, saying that traditional AA methods like SMAA and TAA "offer superior combinations of image quality and performance."

[kotaku-6] "Nvidia Very Quietly Made DLSS A Hell Of A Lot Better". Kotaku. 2020-02-22. Retrieved 2020-04-06. The benefit for most people is that, generally, DLSS comes with a sizeable FPS improvement. How much varies from game to game. In Metro Exodus, the FPS jump was barely there and certainly not worth the bizarre hit to image quality.

[eurogamer-7] "Remedy's Control vs DLSS 2.0 – AI upscaling reaches the next level". Eurogamer. 2020-04-04. Retrieved 2020-04-05. Of course, this isn't the first DLSS implementation we've seen in Control. The game shipped with a decent enough rendition of the technology that didn't actually use the machine learning

[8] "NVIDIA DLSS 2.0 Update Will Fix The GeForce RTX Cards' Big Mistake". techquila.co.in. 2020-03-24. Retrieved 2020-04-06. As promised, NVIDIA has updated the DLSS network in a new GeForce update that provides better, sharper image quality while still retaining higher framerates in raytraced games. While the feature wasn't used as well in its first iteration, NVIDIA is now confident that they have successfully fixed all the issues it had before

[9] "NVIDIA DLSS Plugin and Reflex Now Available for Unreal Engine". NVIDIA Developer Blog. 2021-02-11. Retrieved 2022-02-07.

[10] "NVIDIA DLSS Natively Supported in Unity 2021.2". NVIDIA Developer Blog. 2021-04-14. Retrieved 2022-02-07.

[gamersnexus-11] "HW News - Crysis Remastered Ray Tracing, NVIDIA DLSS 2, Ryzen 3100 Rumors". 2020-04-19. Retrieved 2020-04-19. The original DLSS required training the AI network for each new game. DLSS 2.0 trains using non-game-specific content, delivering a generalized network that works across games. This means faster game integrations, and ultimately more DLSS games.

[NVIDIA-12] Edward Liu, NVIDIA "DLSS 2.0 - Image Reconstruction for Real-time Rendering with Deep Learning"

[:1-13] "Truly Next-Gen: Adding Deep Learning to Games & Graphics (Presented by NVIDIA)". www.gdcvault.com. Retrieved 2022-02-07.

[nividiacontrol-14] Edelsten, Andrew (30 August 2019). "NVIDIA DLSS: Control and Beyond". nividia.com. Retrieved 11 August 2020. we developed a new image processing algorithm that approximated our AI research model and fit within our performance budget. This image processing approach to DLSS is integrated into Control

[control2-15] "NVIDIA DLSS 2.0 Review with Control – Is This Magic?". techquila.co.in. 2020-04-05. Retrieved 2020-04-06.

[16] "NVIDIA preparing Ultra Quality mode for DLSS, 2.2.9.0 version spotted". VideoCardz.com. Retrieved 2021-07-06.

[20] "DLSS: What Does It Mean for Game Developers?". NVIDIA Developer Blog. 2018-09-19. Retrieved 2022-02-07.

[nvidia10-21] "NVIDIA DLSS: Your Questions, Answered". Nvidia. 2019-02-15. Retrieved 2020-04-19. The DLSS team first extracts many aliased frames from the target game, and then for each one we generate a matching “perfect frame” using either super-sampling or accumulation rendering. These paired frames are fed to NVIDIA’s supercomputer. The supercomputer trains the DLSS model to recognize aliased inputs and generate high quality anti-aliased images that match the “perfect frame” as closely as possible. We then repeat the process, but this time we train the model to generate additional pixels rather than applying AA. This has the effect of increasing the resolution of the input. Combining both techniques enables the GPU to render the full monitor resolution at higher frame rates.

[22] "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". www.nvidia.com. Retrieved 2022-02-07.

[23] "NVIDIA's Deep Learning Super Sampling (DLSS) 2.0 Technology Is The Real Deal". Forbes. 2020-03-29. Retrieved 2020-04-07.

[24] "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". Nvidia. 2020-03-23. Retrieved 2020-11-25.

[nvidia20-25] "NVIDIA DLSS 2.0: A Big Leap In AI Rendering". Nvidia. 2020-03-23. Retrieved 2020-04-07.

[tensorcore1-26] "NVIDIA TENSOR CORES". Nvidia. Retrieved 2020-04-07.

[27] "On Tensors, Tensorflow, And Nvidia's Latest 'Tensor Cores'". tomshardware.com. 2017-04-11. Retrieved 2020-04-08.

[tensorcore2-28] "The NVIDIA Titan V Deep Learning Deep Dive: It's All About The Tensor Cores". AnandTech. 2018-07-03. Retrieved 2020-04-08.

[29] "Using CUDA Warp-Level Primitives". Nvidia. 2018-01-15. Retrieved 2020-04-08. NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[a]

[b]

[c]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]