Skip to the content.

Rendering Pipeline Study Notes (2022.06.03)

Home

References

SIGGRAPH: Advances in Real-Time Rendering in Games

SIGGRAPH

SIGGRAPH Asia

JCGT

HPG

GDC

Digital Dragons

Books

Blog

Study Notes

Forward Rendering

Simple pseudo-code:Ortiz18

// Shaders
Shader simpleShader

// Buffers:
Buffer display

for mesh in scene
    for light in scene
        display += simpleShader(mesh, light)

Single PassValient07

Multi-PassValient07

Examples

Example 1: Jurassic World EvolutionCodeCorsair0321

Deferred Shading

  1. For each objectValient07
    • Render surface properties into the G-Buffer
  2. For each light and lit pixelValient07
    1. Use G-Buffer to compute lighting
    2. Add result to frame buffer

Simple Z pre-pass pseudo-code:Ortiz18

// Buffers:
Buffer display
Buffer depthBuffer

// Shaders:
Shader simpleShader
Shader writeDepth

// Visibility
for mesh in scene
    if mesh.depth < depthBuffer.depth
        depthBuffer = writeDepth(mesh)

// Shading and lighting
for mesh in scene
    if mesh.depth == depthBuffer.depth
        for light in scene
            display += simpleShader(mesh, light)

Simple multi-pass deferred rendering pseudo-code:Ortiz18

// Buffers:
Buffer display
Buffer GBuffer

// Shaders:
Shader simpleShader
Shader writeShadingAttributes

// Visibility & materials
for mesh in scene
    if mesh.depth < GBuffer.depth
        GBuffer = writeShadingAttributes(mesh)

// Shading & lighting - multi-pass
for light in scene
    display += simpleShader(GBuffer, light)

Simple single-pass deferred rendering pseudo-code:Ortiz18

// Buffers:
Buffer display
Buffer GBuffer

// Shaders:
Shader manyLightShader
Shader writeShadingAttributes

// Visibility & materials
for mesh in scene
    if mesh.depth < GBuffer.depth
        GBuffer = writeShadingAttributes(mesh)

// Shading & lighting - multi-pass
display = manyLightShader(GBuffer, scene.lights)

Builds an attribute buffer, also known as the G-Buffer.Thibieroz03Calver03

For each object:
    Render to MRT

For each light:
    Apply light as a 2D postprocess

Hargreaves04

Geometry Phase


G-Buffer Building PassThibieroz11

How to reduce export cost (PS to MRT)?


What to Store?

Examples

Example 1Calver03
GBuffer
Pos.X Pos.Y Pos.Z ID
Norm.X Norm.Y Norm.Z MaterialIndex
Albedo.R Albedo.G Albedo.B DiffuseTerm
SpecularEmissive.R SpecularEmissive.G SpecularEmissive.B SpecularTerm
Material Lookup Texture
Kspecblend
KAmb
KEmm
Example 2 : StarCraft IIFilionMcNaughton08
MRTs GBuffer
MRT 0 UnlitAndEmissive.R UnlitAndEmissive.G UnlitAndEmissive.B Unused
MRT 1 Normal.X Normal.Y Normal.Z Depth
MRT 2 Albedo.R Albedo.G Albedo.B AO
MRT 3 (optional) Specular.R Specular.G Specular.B Unused
Example 3 : GDC 2004Hargreaves04

Optimized Version:

Example 4 : Killzone 2Valient07

After deferred composition, post-processing(DoF, bloom, motion blur, colorize, ILR) is applied

MRTs R8 G8 B8 A8
DS Depth 24bpp Depth 24bpp Depth 24bpp Stencil
RT 0 Lighting Accumulation R Lighting Accumulation G Lighting Accumulation B Intensity
RT 1 Normal.X (FP16) Normal.X (FP16) Normal.Y (FP16) Normal.Y (FP16)
RT 2 Motion Vectors X Motion Vectors Y Spec-Power Spec-Intensity
RT 3 Diffuse Albedo R Diffuse Albedo G Diffuse Albedo B Sun-Occlusion

Analysis:

Light Optimization:

Example 5 : Mafia: Definitive EditionCodeCorsair0821
MRTs R G B A
RT 0 Normal.X R16F Normal.Y G16F Normal.Z B16F Roughness A16F
RT 1 Albedo.R R8 Albedo.G G8 Albedo.B B8 Metalness A8
RT 2 MotionVectors.X R16U MotionVectors.Y G16U MotionVector.Z B16U Encoded Vertex Normal A16U
RT 3 Specular Intensity R8 0.5 G8 Curvature or Thickness (for SSS) B8 SSS Profile A8
RT 4 Emissive.R R11F Emissive.g G11F Emissive.b B11F
Example 6: Resident Evil 2Schreiner19
MRTs R G B A
RT 0 Emissive.R R11 Emissive.G G11 Emissive.B B10
RT 1 Albedo.R R8 Albedo.G G8 Albedo.B B8 Metalness A8
RT 2 Normal.R R10 Normal.G G10 Roughness B10 Miscellanous A2
RT 3 Baked AO R16 Velocity.Y G16 Velocity.Z B16 SSS A16
Example 7: Batman: Arkham KnightsMoradin20
</tr>
MRTs R G B A
RT 0 Emissive.R R11 Emissive.G G11 Emissive.B B10
RT 1 Normal.X R10 Normal.Y G10 Normal.Z B10 A2
RT 2 Albedo.R R8 Albedo.G G8 Albedo.B B8 Material Masking A8
RT 3 Material Properties R8 Material Properties G8 Material Properties B8 Material Properties A8

Lighting Phase

Full screen lighting is actually slower than forward rendering.


Shading PassesThibieroz11



Tiled Shading


Olsson14

The Bandwith problem:

for each light
    for each covered pixel
        read G-Buffer
        compute shading
        read + write frame buffer   // requires repeated reading and writing of the G-Buffers and frame buffer
for each pixel
    read G-Buffer
    for each affecting light
        compute shading
    write frame buffer  // single write

ex)

Global Light List
L0 L1 L2 L3 L4 L5 L6 L7
Tile Light Index Lists
0 0 6 3 0 6 4 4
Tile Light Index Lists
0 1 4 7
1 3 3 1
66 67 69
1 2 2
  1. Render Scene to G-Buffers
    • Store geometry attributes per pixel
      • G-Buffers
  2. Build Light Grid
  3. Full Screen Quad (or CUDA, or Compute Shaders, or SPUs)
    • For each pixel
      • Fetch G-Buffer Data
      • Find Tile
      • Loop over lights and accumulate shading
      • Write shading

Simple single-pass deferred rendering using tiled shading pseudo-code:Ortiz18

// Buffers:
Buffer display
Buffer GBuffer
Buffer tileArray

// Shaders:
Shader manyLightShader
Shader writeShadingAttributes
CompShader lightInTile

// Visibility & materials
for mesh in scene
    if mesh.depth < GBuffer.depth
        GBuffer = writeShadingAttributes(mesh)

// Light culling
for tile in tileArray
    for light in scene
        if lightInTile(tile, light)
            tile += light

// Shading & lighting - multi-pass
display = manyLightShader(GBuffer, tileArray)

Based on the light attenuation, we can calculate the influence of a light to create a concrete volume of influence which is represented with different shapes for each type of light source.Ortiz18

Based on these shapes, perform a collision detection between the light volumes and tiles. Tile forward shading is sometimes also referred to as Forward+.Ortiz18

However, tiles has no information regarding the depth discontinuity of the lights.Ortiz18

Clustered Shading


Ortiz18

If tiled shading is a 2D approach, clustered shading is a 3D approach.

Simple single-pass deferred rendering using clustered shading pseudo-code:

Buffer display
Buffer GBuffer
Buffer clusterArray

// Shaders:
Shader manyLightShader
Shader writeShadingAttributes
CompShader lightInCluster

// Visibility & materials
for mesh in scene
    if mesh.depth < GBuffer.depth
        GBuffer = writeShadingAttributes(mesh)

// Light culling
for cluster in clusterArray
    for light in scene
        if lightInCluster(cluster, light)
            cluster += light

// Shading & lighting - multi-pass
display = manyLightShader(GBuffer, clusterArray)

Olsson14

Algorithm:


Light Volume Rendering


Thibieroz11


Kaplanyan10


Screen-Aligned QuadsThibieroz11

Global Lights

Sun Rendering

Optimization:Shishkovtsov05

Shishkovtsov05

Pass 0: 
Render full-screen quad only where 0x03==stencil count (where attributes are stored)    
If ((N dot L) * ambient_occlusion_term > 0)      
    discard fragment
Else
    color = 0, stencil = 0x01

Pass 1: 
Render full-screen quad only where 0x03==stencil count
Perform light accumulation / shading

Local Lights

Convex Light Hulls

Stencil Light VolumesHargreaves04

Light Volume Z TestsHargreaves04

Alpha Blending

HDR

Antialiasing

Shishkovtsov05

struct v2p
{
    float4 tc0: TEXCOORD0; // Center    
    float4 tc1: TEXCOORD1; // Left Top      
    float4 tc2: TEXCOORD2; // Right Bottom    
    float4 tc3: TEXCOORD3; // Right Top    
    float4 tc4: TEXCOORD4; // Left Bottom      
    float4 tc5: TEXCOORD5; // Left / Right    
    float4 tc6: TEXCOORD6; // Top / Bottom  
};      
/////////////////////////////////////////////////////////////////////  
uniform sampler2D s_distort;  
uniform half4 e_barrier;  // x=norm(~.8f), y=depth(~.5f)  
uniform half4 e_weights;  // x=norm, y=depth  
uniform half4 e_kernel;   // x=norm, y=depth    
/////////////////////////////////////////////////////////////////////  

half4 main(v2p I) : COLOR  
{   
    // Normal discontinuity filter   
    half3 nc = tex2D(s_normal, I.tc0);   
    half4 nd;   
    nd.x = dot(nc, (half3)tex2D(s_normal, I.tc1));   
    nd.y = dot(nc, (half3)tex2D(s_normal, I.tc2));   
    nd.z = dot(nc, (half3)tex2D(s_normal, I.tc3));   
    nd.w = dot(nc, (half3)tex2D(s_normal, I.tc4));   
    nd -= e_barrier.x;   
    nd = step(0, nd);   
    half ne = saturate(dot(nd, e_weights.x));     
    
    // Opposite coords     
    float4 tc5r = I.tc5.wzyx;   
    float4 tc6r = I.tc6.wzyx;     
    
    // Depth filter : compute gradiental difference:   
    // (c-sample1)+(c-sample1_opposite)   
    half4 dc = tex2D(s_position, I.tc0);   
    half4 dd;   
    dd.x = (half)tex2D(s_position, I.tc1).z + (half)tex2D(s_position, I.tc2).z;   
    dd.y = (half)tex2D(s_position, I.tc3).z + (half)tex2D(s_position, I.tc4).z;   
    dd.z = (half)tex2D(s_position, I.tc5).z + (half)tex2D(s_position, tc5r).z;   
    dd.w = (half)tex2D(s_position, I.tc6).z + (half)tex2D(s_position, tc6r).z;   
    dd = abs(2 * dc.z - dd)- e_barrier.y;   
    dd = step(dd, 0);   
    half de = saturate(dot(dd, e_weights.y));     
    
    // Weight     
    half w = (1 - de * ne) * e_kernel.x; 
    // 0 - no aa, 1=full aa     
    // Smoothed color   
    // (a-c)*w + c = a*w + c(1-w)   
    float2 offset = I.tc0 * (1-w);   
    half4 s0 = tex2D(s_image, offset + I.tc1 * w);   
    half4 s1 = tex2D(s_image, offset + I.tc2 * w);   
    half4 s2 = tex2D(s_image, offset + I.tc3 * w);   
    half4 s3 = tex2D(s_image, offset + I.tc4 * w);   

    return (s0 + s1 + s2 + s3)/4.h;  
} 

Examples

Example 1: Phong modelThibieroz03

G-Buffer structure:

  1. Pixel position
    • World space position of the pixel
    • R16G16B16A16 FLOAT
  2. Pixel Normal vector
    • World space normalized normal vector
    • Choice:
      • Model space: Simplest.
      • Tangent space
    • R10G10B10A2 / R8G8B8A8 INT / FLOAT
  3. Pixel diffuse color
    • R8G8B8A8 FLOAT

PreLightPass.vs

;-------------------------------------------------------------------
; Constants specified by the app
; c0-c3 = Global transformation matrix (World*View*Projection)
; c4-c7 = World transformation matrix
;
; Vertex components
; v0 = Vertex Position
; v1, v2, v3 = Inverse of tangent space vectors
; v4 = 2D texture coordinates (model coordinates)
;-------------------------------------------------------------------
vs_2_0

dcl_position v0 ; Vertex position
dcl_binormal v1 ; Transposed binormal
dcl_tangent v2 ; Transposed tangent
dcl_normal v3 ; Transposed normal
dcl_texcoord v4 ; Texture coordinates for diffuse and normal map

; Vertex transformation
m4x4 oPos, v0, c0 ; Transform vertices by WVP matrix

; Model texture coordinates
mov oT0.xy, v4.xy ; Simply copy texture coordinates

; World space coordinates
m4x3 oT1.xyz, v0, c4 ; Transform vertices by world matrix (no w
; needed)

; Inverse (transpose) of tangent space vectors
mov oT2.xyz, v1
mov oT3.xyz, v2
mov oT4.xyz, v3 ; Pass in transposed tangent space vectors
cbuffer ConstantBuffer : register(b0)
{
    float4x4 WVP;   // Global transformation matrix
    float4x4 World; // World transformation matrix
}

struct VSInput
{
    float4 Position : POSITION;     // Vertex position
    float3 Binormal : BINORMAL;     // Inverse of tangent space vectors >> Transposed binormal
    float3 Tangent : TANGENT;       // Inverse of tangent space vectors >> Transposed tangent
    float4 Normal : NORMAL;         // Inverse of tangent space vectors >> Transposed normal
    float2 TexCoord : TEXCOORD0;    // 2D texture coordinates for diffuse and normal map (model coordinates)
};

struct VSOutput
{
    float4 Position : SV_Position;
    float3 WorldPos : WorldPos;
    float3 Binormal : Binormal;
    float3 Tangent : Tangent;
    float3 Normal : Normal;
    float2 TexCoord : TexCoord0;
};

VSOutput main(VSInput vsInput)
{
    VSOutput vsOutput;

    // Vertex transformation
    vsOutput.Position = mul(WVP, vsInput.Position); // Transform vertices by WVP matrix

    // Model texture coordinates
    vsOutput.TexCoord = vsInput.TexCoord;   // Simply copy texture coordinates

    // World space coordinates
    vsOutput.WorldPos = mul(float3x3(World), vsInput.Position.xyz);   // Transform vertices by world matrix (no w needed)

    // Inverse(transpose) of tangent space vectors
    vsOutput.Binormal = vsInput.Binormal;
    vsOutput.Tangent = vsInput.Tangent;
    vsOutput.Normal = vsInput.Normal;   // Pass in transposed tangent space vectors

    return vsOutput;
}

PreLightPass.ps

;-------------------------------------------------------------------
; Constants specified by the app
; c0-c3 = World transformation matrix for model
;-------------------------------------------------------------------
ps_2_0

; Samplers
dcl_2d s0 ; Diffuse map
dcl_2d s1 ; Normal map

; Texture coordinates
dcl t0.xy ; Texture coordinates for diffuse and normal map
dcl t1.xyz ; World-space position
dcl t2.xyz ; Binormal
dcl t3.xyz ; Tangent
dcl t4.xyz ; Normal (Transposed tangent space vectors)

; Constants
def c30, 1.0, 2.0, 0.0, 0.0
def c31, 0.2, 0.5, 1.0, 1.0

; Texture sampling
texld r2, t0, s1 ; r2 = Normal vector from normal map
texld r3, t0, s0 ; r3 = Color from diffuse map

; Store world-space coordinates into MRT#0
mov oC0, t1 ; Store pixel position in MRT#0

; Convert normal to signed vector
mad r2, r2, c30.g, -c30.r ; r2 = 2*(r2 - 0.5)

; Transform normal vector from tangent space to model space
dp3 r4.x, r2, t2
dp3 r4.y, r2, t3
dp3 r4.z, r2, t4 ; r4.xyz = model space normal

; Transform model space normal vector to world space. Note that only
; the rotation part of the world matrix is needed.
; This step is not required for static models if their
; original model space orientation matches their orientation
; in world space. This would save 3 instructions.
m4x3 r1.xyz, r4, c0

; Convert normal vector to fixed point
; This is not required if the destination MRT is float or signed
mad r1, r1, c31.g, c31.g ; r1 = 0.5*(r1 + 0.5)

; Store world-space normal into MRT#1
mov oC1, r1

; Store diffuse color into MRT#2
mov oC2, r3
cbuffer ConstantBuffer : register(b0)
{
    float4x4 World;
}

// Samplers
Texture2D<float3> g_DiffuseMap  : register(t0); // Diffuse map
Texture2D<float3> g_NormalMap   : register(t1);  // Normal map
SamplerState g_Sampler          : register(s0);

struct PSInput
{
    float4 Position : SV_Position;
    float3 WorldPos : WorldPos;     // World-space position
    float3 Binormal : Binormal;     // Binormal
    float3 Tangent : Tangent;       // Tangent
    float3 Normal : Normal;         // Normal (Transposed tangent space vectors)
    float2 TexCoord : TexCoord0;    // Texture Coordinates for diffuse and normal map
};

struct Mrt
{
    float3 WorldPos;
    float3 WorldNormal;
    float3 Albedo;
}

Mrt main(PSInput psInput)
{
    Mrt mrt;

    // Texture sampling
    float3 normal = g_NormalMap.Sample(g_Sampler, psInput.TexCoord);    // Normal vector from normal map
    float3 albedo = g_DiffuseMap.Sample(g_Sampler, psInput.TexCoord);   // Color from diffuse map

    // Store world-space coordinates into MRT#0
    mrt.WorldPos = psInput.WorldPos;

    // Convert normal to signed vector if needed
    // normal = 2.0 * (normal - 0.5);

    // Transform normal vector from tangent space to model space
    float3 modelNormal = float3(dot(normal, psInput.Binormal), dot(normal, psInput.Tangent), dot(normal, psInput.Normal));

    // Transform model space normal vector to world space. Note that only the rotation part of the world matrix is needed.
    // This step is not required for static models if their original model space orientation matches their orientation in world space.
    // This would save 3 instructions.
    float3 worldNormal = mul(float3x3(World), modelWorld);

    // Convert normal vector to fixed point
    // This is not required if the destination MRT is float or signed
    // worldNormal = 0.5 * (worldNormal + 0.5)

    // Store world-space normal into MRT#1
    mrt.WorldNormal = worldNormal;

    // Store diffuse color into MRT#2
    mrt.Albedo = albedo;

    return mrt;
}

LightPass.ps

;-------------------------------------------------------------------
; Constants specified by the app
; c0 : light position in world space
; c8 : camera position in world space
; c22: c22.a = 1/(light max range), c22.rgb = 1.0f
;-------------------------------------------------------------------
ps_2_0

; Samplers
dcl_2d s0 ; MRT#0 = Pixel position in world space
dcl_2d s1 ; MRT#1 = Pixel normal vector
dcl_2d s2 ; MRT#2 = Pixel diffuse color
dcl_2d s3 ; Falloff texture
dcl_cube s4 ; Cube normalization texture map

; Texture coordinates
dcl t0.xy ; Quad screen-space texture coordinates

; Constants
def c20, 0.5, 2.0, -2.0, 1.0
def c21, 8.0, -0.75, 4.0, 0.0

; Retrieve property buffer data from MRT textures
texld r0, t0, s0 ; r0.xyz = Pixel world space position
texld r2, t0, s1 ; r2.xyz = Pixel normal vector
texld r3, t0, s2 ; r3.rgb = Pixel color

; Convert normal to signed vector
; This is not required if the normal vector was stored in a signed
; or float format
mad r2, r2, c20.y, -c20.w ; r2 = 2*(r2 - 1)

; Calculate pixel-to-light vector
sub r1.xyz, c0, r0 ; r1 = Lpos - Vpos
mov r1.w, c20.w ; Set r1.w to 1.0
nrm r4, r1 ; Normalize vector (r4.w = 1.0/distance)

; Compute diffuse intensity
dp3 r5.w, r4, r2 ; r5.w = (N.L)

; FallOff
rcp r6, r4.w ; r6 = 1/(1/distance) = distance
mul r6, r6, c22.a ; Divide by light max range
texld r6, r6, s3 ; Sample falloff texture

; Compute halfway vector
sub r1.xyz, c8, r0 ; Compute view vector V (pixel to camera)
texld r1, r1, s4 ; Normalized vector with cube map
mad r1, r1, c20.y, -c20.w ; Convert vector to signed format
add r1, r1, r4 ; Add view and light vector
texld r1, r1, s4 ; Normalize half angle vector with cube map
mad r1, r1, c20.y, -c20.w ; Convert to signed format

; Compute specular intensity
dp3_sat r1.w, r1, r2 ; r1.w = sat(H.N)
pow r1.w, r1.w, c21.r ; r1.w = (H.N)^8
; Set specular to 0 if pixel normal is not facing the light
cmp r1.w, r5.w, r1.w, c21.w ; r1.w = ( (N.L)>=0 ) ? (H.N)^8 : 0

; Output final color
mad r0, r3, r5.w, r1.w ; Modulate diffuse color and diffuse

; intensity and add specular
mul r0, r0, r6 ; Modulate with falloff
mov oC0, r0 ; Output final color
cbuffer ConstantBuffer : register(b0)
{
    float4 LightPos;        // Light position in world space
    float4 CameraPos;       // Camera position in world space
    float LightAttenuation; // 1.0 / (light max range)
}

// Samplers
Texture2D<float3> g_WorldPos            : register(t0);
Texture2D<float3> g_WorldNormal         : register(t1);
Texture2D<float3> g_Albedo              : register(t2);
Texture2D<float3> g_Falloff             : register(t3);
TextureCube<float4> g_Normalization     : register(t4);
SamplerState g_DefaultSampler           : register(s0);
SamplerState g_CubeSampler              : register(s1);

// Texture coordinates
struct PSInput
{
    float4 Position : SV_Position;
    float2 TexCoord : TexCoord0;    // Quad screen-space texture coordinates
};

float4 main(PSInput psInput) : SV_Target
{
    // Retrieve property buffer data from MRT textures
    float3 worldPos = g_WorldPos.Sample(g_DefaultSampler, psInput.TexCoord);        // pixel world space position
    float3 worldNormal = g_WorldNormal.Sample(g_DefaultSampler, psInput.TexCoord);  // pixel normal vector
    float3 albedo = g_Albedo.Sample(g_DefaultSampler, psInput.TexCoord);            // pixel color

    // Convert normal to signed vector
    // This is not required if the normal was stored in a signed or float format
    // worldNormal = 2.0 * (worldNormal - 1.0)

    // Calculate pixel-to-light vector
    float4 lightDir = normalize(float4(LightPos - worldPos, 1.0));   // lightDir.w = 1.0 / distance

    // Compute diffuse intensity
    float diffuseIntensity = dot(lightDir, worldNormal);    // diffuse intensity = (N dot L)

    // Fall off
    float fallOff = 1.0 / lightDir.w;   // 1.0 / (1.0 / distance) = distance
    fallOff *= LightAttenuation;        // Divide by light max range
    fallOff = g_Falloff.Sample(g_DefaultSampler, float2(fallOff, fallOff)); // Sample falloff texture

    // Compute halfway vector
    float4 viewDir = CameraPos - worldPos;  // Compute view vector V (pixel to camera)
    viewDir = g_Normalization.Sample(g_CubeSampler, viewDir.xyz);   // Normalized vector with cube map
    // Convert vector to signed format
    // viewDir = 2.0 * (viewDir - 1.0);

    // Compute specular intensity
    float specularIntensity = pow(saturate(dot(viewDir, worldNormal)), 8.0);    // (H dot N)^8

    // Set specular to 0 if pixel normal is not facing the light
    specularIntensity = (diffuseIntensity >= 0.0) ? specularIntensity : 0.0;

    // Output final color
    float4 color = albedo * diffuseIntensity + specularIntensity; // modulate diffuse color and diffuse intensity and add specular
    color *= fallOff;   // Modulate with falloff

    return color;   // Output final color
}

Example 2: AMD GDC 2011Thibieroz11

  1. Light Pre-pass
    1. Render Normals
      • 1st geometry pass results in normal(and depth) buffer
        • Uses a single color RT
        • No MRT required
    2. Lighting Accumulation
      • Perform all lighting calculation into light buffer
        • Use normal and depth buffer as input textures
        • Render geometries enclosing light area
        • Write (LightColor * N.L * Attenuation) in RGB, specular in A
    3. Combine lighting with materials
      • 2nd geometry pass using light buffer as input
        • Fetch geometry material
        • Combine with light data
          • Advantages
            • One material fetch per pixel regardless of number of lights
          • Disadvantages
            • Two scene geometry passes required
            • Unique lighting model
  2. Light Volume Rendering
    1. Early Z culling Optimizations
      • When camera is inside the light volume
        • Set Z Mode = GREATER
        • Render volume’s back faces
      • Only samples fully inside the volume get shaded
        • Optimal use of early Z culling
        • No need for stencil
        • High efficiency
      • Previous optimization does not work if camera is outside volume
        • Back faces also pass the Z=GREATER test for objects in front of volume
      • Alternatively, when camera is outside the light volume:
        • Set Z Mode = LESSEQUAL
        • Render volume’s front faces
        • But generates wasted processing for objects behind the volume
      • Stencil can be used to mark samples inside the light volume
      • Render volume with stencil-only pass:
        • Clear stencil to 0
        • Set Z Mode = LESSEQUAL
        • If depth test fails:
          • Increment stencil for back faces
          • Decrement stencil for front faces
      • Render some geometry where stencil != 0

Example 3: AnKi 3D EngineCharitos19

Passs:

</tr>
MRTs R G B A
RT 0 Albedo.R R8U Albedo.G G8U Albedo.B B8U Subsurface term A8U
RT 1 Roughness R8U Metallic G8U Fresnel term B8U Emission scaling A8U
RT 2 WS Normal.X R10U_PACK32 WS Normal.Y G10U_PACK32 Emission B10U_PACK32 Sign of the Normal's Z A2U_PACK32
RT 3 Velocity R16S Velocity G16S

Example 4: The SurgeHammer18

</tr>
MRTs R G B A
RT 0 Albedo.R R8 Albedo.G G8 Albedo.B B8 Material ID
RT 1 VS Normal.X R10 VS Normal.Y G10 VS Normal.Z B10 A2
RT 2 Roughness R8 Metalness G8 Occlusion B8 (shared) A8
RT 3 Motion Vectors X R16 Motion Vectors Y G16

PBR:

Clustered Deferred Rendering:

Example 5: Rainbow Six SiegeElMansouri16

</tr>
MRTs R G B A
RT 0 Emissive.R R11 Emissive.G G11 Emissive.B B10
RT 1 Normal.X R10 Normal.Y G10 Normal.Z B10 A2
RT 2 Albedo.R R8 Albedo.G G8 Albedo.B B8 Material Masking A8
RT 3 Material Properties R8 Material Properties G8 Material Properties B8 Material Properties A8

Example 6: Uncharted 4ElGarawany16

GBuffers R G B A
GBuffer 0 R
G
B
Spec
Normal.X
Normal.Y
iblUseParent, Normal Extra
Roughness
GBuffer 1 Ambient Translucency
Sun Shadow High
Spec Occlusion
Heightmap Shadowing
Sun Shadow Low
Metallic
Dominant Direction X
Dominant Direction Y
AO
Extra Material Mask
Sheen
Thin Wall Translucency
GBuffer 2 (optional) Used by more complicated materials

Example 7: CryENGINE 3Kaplanyan10

Example 8: UnityGolubev18

Forward+HaradaMcKeeYang13

Light Culling

Implementation and Optimization

Pseudo-code:

float4 frustum[4];
{   // construct frustum
    float4 v[4];
    v[0] = projToView(8 * GET_GROUP_IDX, 8 * GET_GROUP_IDY, 1.f);
    v[1] = projToView(8 * (GET_GROUP_IDX + 1), 8 * GET_GROUP_IDY, 1.f);
    v[2] = projToView(8 * (GET_GROUP_IDX + 1), 8 * (GET_GROUP_IDY + 1), 1.f);
    v[3] = projToView(8 * GET_GROUP_IDX, 8 * (GET_GROUP_IDY + 1), 1.f);
    float4 o = make_float4(0.f, 0.f, 0.f, 0.f);
    for (int i = 0; i < 4; ++i)
    {
        frustum[i] = createEquation(o, v[i], v[(i + 1) & 3]);
    }
}
float depth = depthIn.Load(uint3(GET_GLOBAL_IDX, GET_GLOBAL_IDY, 0));

float4 viewPos = projToView(GET_GLOBAL_IDX, GET_GLOBAL_IDY, depth);

int lIdx = GET_LOCAL_IDX + GET_LOCAL_IDY * 8;
{   // Calculate bound
    if (lIdx == 0)  // Initialize
    {
        ldsZMax = 0;
        ldsZMin = 0xffffffff;
    }
    GroupMemoryBarrierWithGroupSync();
    u32 z = asuint(viewPos.z);
    if (depth != 1.f)
    {
        AtomMax(ldsZMax, z);
        AtomMin(ldsZMin, z);
    }
    GroupMemoryBarrierWithGroupSync();
    maxZ = asfloat(ldsZMax);
    minZ = asfloat(ldsZMin);
}

MiniEngine version:

// Read all depth values for this tile and compute the tile min and max values
for (uint dx = GTid.x; dx < WORK_GROUP_SIZE_X; dx += 8)
{
    for (uint dy = GTid.y; dy < WORK_GROUP_SIZE_Y; dy += 8)
    {
        uint2 DTid = Gid * uint2(WORK_GROUP_SIZE_X, WORK_GROUP_SIZE_Y) + uint2(dx, dy);

        // If pixel coordinates are in bounds...
        if (DTid.x < ViewportWidth && DTid.y < ViewportHeight)
        {
            // Load and compare depth
            uint depthUInt = asuint(depthTex[DTid.xy]);
            InterlockedMin(minDepthUInt, depthUInt);
            InterlockedMax(maxDepthUInt, depthUInt);
        }
    }
}

GroupMemoryBarrierWithGroupSync();
for (int i = 0; i < nBodies; i += 64)
{
    int il = lIdx + i;
    if (il < nBodies)
    {
        if (overlaps(frustum, gLightGeometry[i]))
        {
            appendLightToList(il);
        }
    }
}
groupshared u32 ldsLightIdx[LIGHT_CAPACITY];
groupshared u32 ldsLightIdxCounter;
void appendLightToList(int i)
{
    u32 dstIdx = 0;
    InterlockedAdd(ldsLightIdxCounter, 1, dstIdx);
    if (dstIdx < LIGHT_CAPACITY)
    {
        ldsLightIdx[dstIdx] = i;
    }
}

MiniEngine version:

groupshared uint tileLightCountSphere;
groupshared uint tileLightCountCone;
groupshared uint tileLightCountConeShadowed;

groupshared uint tileLightIndicesSphere[MAX_LIGHTS];
groupshared uint tileLightIndicesCone[MAX_LIGHTS];
groupshared uint tileLightIndicesConeShadowed[MAX_LIGHTS];

// find set of lights that overlap this tile
for (uint lightIndex = GI; lightIndex < MAX_LIGHTS; lightIndex += 64)
{
    LightData lightData = lightBuffer[lightIndex];
    float3 lightWorldPos = lightData.pos;
    float lightCullRadius = sqrt(lightData.radiusSq);

    bool overlapping = true;
    for (int p = 0; p < 6; p++)
    {
        float d = dot(lightWorldPos, frustumPlanes[p].xyz) + frustumPlanes[p].w;
        if (d < -lightCullRadius)
        {
            overlapping = false;
        }
    }
    
    if (!overlapping)
        continue;

    uint slot;

    switch (lightData.type)
    {
    case 0: // sphere
        InterlockedAdd(tileLightCountSphere, 1, slot);
        tileLightIndicesSphere[slot] = lightIndex;
        break;

    case 1: // cone
        InterlockedAdd(tileLightCountCone, 1, slot);
        tileLightIndicesCone[slot] = lightIndex;
        break;

    case 2: // cone w/ shadow map
        InterlockedAdd(tileLightCountConeShadowed, 1, slot);
        tileLightIndicesConeShadowed[slot] = lightIndex;
        break;
    }

    // update bitmask
    InterlockedOr(tileLightBitMask[lightIndex / 32], 1 << (lightIndex % 32));
}

GroupMemoryBarrierWithGroupSync();
{   // Write back
    u32 startOffset = 0;
    if (lIdx == 0)
    {   // Reserve memory
        if (ldsLightIdxCounter != 0)
        {
            InterlockedAdd(gLightIdxCounter, ldsLightIdxCounter, startOffset);
        }

        ptLowerBound[tileIdx] = startOffset;
        ldsLightIdxStart = startOffset;
    }
    GroupMemoryBarrierWithGroupSync();
    startOffset = ldsLightIdxStart;

    for (int i = lIdx; i < ldsLightIdxCounter; i += 64)
    {
        gLightIdx[startOffset + i] = ldsLightIdx[i];
    }
}

MiniEngine version:

if (GI == 0)
{
    uint lightCount = 
        ((tileLightCountSphere & 0xff) << 0) |
        ((tileLightCountCone & 0xff) << 8) |
        ((tileLightCountConeShadowed & 0xff) << 16);
    lightGrid.Store(tileOffset + 0, lightCount);

    uint storeOffset = tileOffset + 4;
    uint n;
    for (n = 0; n < tileLightCountSphere; n++)
    {
        lightGrid.Store(storeOffset, tileLightIndicesSphere[n]);
        storeOffset += 4;
    }
    for (n = 0; n < tileLightCountCone; n++)
    {
        lightGrid.Store(storeOffset, tileLightIndicesCone[n]);
        storeOffset += 4;
    }
    for (n = 0; n < tileLightCountConeShadowed; n++)
    {
        lightGrid.Store(storeOffset, tileLightIndicesConeShadowed[n]);
        storeOffset += 4;
    }

    lightGridBitMask.Store4(tileIndex * 16, tileLightBitMask);
}

Shading

Tiled Shading

Kaplanyan10

Light Pre-Pass Renderer

LightPrePassRendererEngelShaderX709

Comparison and Conclusion

|Forward / Z Pre-Pass Renderer|Light Pre-Pass Renderer| |—|—| |Lots of geometry throughput| | |Need to split up geometry following light distribution| | |Dependent texture look-up and lights are not fully dynamic| | EngelSiggraph09

|Deferred Renderer|Light Pre-Pass Renderer| |—|—| |Lights are independent from geometry| | |Geometry pass stores all material and light properties|Geometry pass fills up normal and depth buffer,
Lighting pass stores light properties in light buffer| |Only one geometry pass for the main view|(Version A) Second geometry buffer fetches light buffer and apply different material terms per surface by reconstructing the lighting equation
(Version B) Ambient + Resolve(MSAA) pass fetches light buffer and uses its content as diffuse / specular content and add the ambient term while resolving into the main buffer| |Lights are blit and therefore only limited by memory bandwidth|| |Memory bandwidth| | |Recalculate full lighting equation for every light| | |Limited material representation in G-Buffer| | |MSAA difficult compared to Forward renderer| | EngelSiggraph09

Clustered Shading

Cluster Deferred Shading AlgorithmOlssonBilleterAssarsson12

  1. Render scene to G-Buffers
  2. Cluster assignment
  3. Find unique clusters
  4. Assign lights to clusters
  5. Shade samples

Cluster Assignment

Finding Unique Clusters


@startuml
start
split
group Render Opaque Objects
    split
    :Normals;
    split again
    :Depth Buffer;
    end split
    :Switch Off Depth Write;
    :Light Buffer; 
floating note left: Sort Back-To-Front
    :Forward Rendering;
floating note left: Sort Front-To-Back
end group
split again
group Transparent Objects
    :Switch Off Depth Write;
    :Forward Rendering;
    floating note right: Sort Back-To-Front
end group
end split
stop
@enduml
k = \left \lfloor \frac{\log{\left(-z_{vs}/near\right)}}{\log{\left(1 + \frac{2 \tan{\theta}}{S_y} \right )}} \right \rfloor
\left(i, j, k \right ) = \left(\left \lfloor x_{ss}/t_x \right \rfloor, \left \lfloor y_{ss}/t_y \right \rfloor, \left \lfloor \frac{\log{\left(-z_{vs}/near\right)}}{\log{\left(1 + \frac{2 \tan{\theta}}{S_y} \right )}} \right \rfloor \right )