Rendering Pipeline Comparison (2022.07.27)
Environment
Type | Name |
---|---|
OS | Windows 10 Pro 64 |
Processor | Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (16 CPUs), ~2.9GHz |
Memory | 32768MB RAM |
Device | NVIDIA GeFore RTX 3080 |
VRAM | 10077MB |
Display | 1920 × 1200 (32 bit) (59 Hz) |
Build Configuration | Debug / Windows |
Comparison Methods
Bandwidth ComparisonThibieroz04
\textrm{Bandwidth}_{60\textrm{fps}} = \left(\textrm{W} \times \textrm{H} \times \left[ \textrm{MRT}_{\textrm{BPP}} \times
n_{\textrm{MRT}} \times n + \textrm{Z}_\textrm{BPP} \times \textrm{Overdraw} + \textrm{T}_\textrm{BPP} \times \textrm{T}_\textrm{B} + n \times \left(2 \times \textrm{BB}_\textrm{BPP} + \textrm{T}_\textrm{S} \times \textrm{T}_\textrm{BPP} \right ) \right ] + \textrm{C}_\textrm{Geometry} \right ) \times 60 \textrm{Bytes} / \textrm{s}
Storage Comparison
- NormalKoonce07
- R8G8B8A8 vs R10G10B10R2 vs R16G16F vs R16G16B16A16F
- Deferring Cost
- Decoding Cost
- Sampling and Storage
- Quality
- Free Components
- R8G8B8A8 vs R10G10B10R2 vs R16G16F vs R16G16B16A16F
Duration
Frame Duration
Light Phase Duration
Render Color Duration
Tile / Cluster Assignment Duration
Bandwidth
Frame
Pipeline | Frame | |||||
---|---|---|---|---|---|---|
DRAM Read/Write Utilization | DRAM Activity | L1 Cache | L2 Cache | |||
Percentage utilization of DRAM reads | Percentage utilization of DRAM writes | Total DRAM Read/Write Utilization | Percentage of memory cycles that a read or write request to DRAM was active | Read/write utilization | Read/write utilization | |
Forward | 16.333333 | 1 | 17.333333 | 17.666667 | 53.666667 | 13.666667 |
Forward+ | 35.333333 | 2 | 37.333333 | 37.333333 | 24.333333 | 26.666667 |
Forward+ 2.5D Culling | 35.666667 | 2 | 37.666667 | 37.666667 | 24.666667 | 26.666667 |
Forward+ 2.5D, AABB-based Culling | 38.666667 | 2 | 40.666667 | 41 | 18.666667 | 29 |
Forward Clustered | 34 | 2 | 36 | 36.333333 | 23.333333 | 26 |
Deferred | 20 | 1 | 21 | 21 | 53 | 17 |
Deferred Tiled | 37 | 2 | 39 | 39 | 21 | 28 |
Deferred Tiled 2.5D Culling | 36.333333 | 2 | 38.333333 | 39.333333 | 20.333333 | 27.333333 |
Deferred Tiled 2.5D, AABB-based Culling | 39 | 2 | 41 | 41.666667 | 15.666667 | 29.666667 |
Deferred Tiled (DICE) | 36.666667 | 2 | 38.666667 | 39 | 21 | 27.666667 |
Deferred Tiled (DICE) 2.5D Culling | 36.333333 | 2 | 38.333333 | 39.333333 | 21.333333 | 27.333333 |
Deferred Tiled (DICE) 2.5D, AABB-based Culling | 39.666667 | 2 | 41.666667 | 42 | 16 | 29.666667 |
Deferred Tiled (Intel) | 36.666667 | 2 | 38.666667 | 39.333333 | 21.333333 | 27.666667 |
Deferred Clustered | 36 | 2 | 38 | 38.666667 | 20.666667 | 28 |
Geometry Phase
Pipeline | Geometry Phase | |||||
---|---|---|---|---|---|---|
DRAM Read/Write Utilization | DRAM Activity | L1 Cache | L2 Cache | |||
Percentage utilization of DRAM reads | Percentage utilization of DRAM writes | Total DRAM Read/Write Utilization | Percentage of memory cycles that a read or write request to DRAM was active | Read/write utilization | Read/write utilization | |
Deferred | 19.666667 | 21.333333 | 41 | 41.333333 | 59.666667 | 42.666667 |
Deferred Tiled | 19.666667 | 21.666667 | 41.333333 | 42 | 60.333333 | 43 |
Deferred Tiled 2.5D Culling | 19 | 21.666667 | 40.666667 | 41 | 60.333333 | 42.666667 |
Deferred Tiled 2.5D, AABB-based Culling | 18.333333 | 21 | 39.333333 | 39.333333 | 58.666667 | 41.333333 |
Deferred Tiled (DICE) | 18.666667 | 21.666667 | 40.333333 | 41 | 60.333333 | 42.666667 |
Deferred Tiled (DICE) 2.5D Culling | 19 | 21.666667 | 40.666667 | 41.333333 | 60.333333 | 42.666667 |
Deferred Tiled (DICE) 2.5D, AABB-based Culling | 18.666667 | 21.666667 | 40.333333 | 41 | 60.666667 | 43 |
Deferred Tiled (Intel) | 18 | 21 | 39 | 39.666667 | 58.666667 | 41.666667 |
Deferred Clustered | 18 | 21 | 39 | 39.666667 | 58.666667 | 41.666667 |
Light Phase
Pipeline | Lighting Phase | |||||
---|---|---|---|---|---|---|
DRAM Read/Write Utilization | DRAM Activity | L1 Cache | L2 Cache | |||
Percentage utilization of DRAM reads | Percentage utilization of DRAM writes | Total DRAM Read/Write Utilization | Percentage of memory cycles that a read or write request to DRAM was active | Read/write utilization | Read/write utilization | |
Deferred | 0 | 0 | 0 | 0 | 85 | 3.333333 |
Deferred Tiled | 3 | 0 | 3 | 4 | 94.666667 | 4.666667 |
Deferred Tiled 2.5D Culling | 3.333333 | 0 | 3.333333 | 4 | 94.333333 | 4.666667 |
Deferred Tiled 2.5D, AABB-based Culling | 4 | 1 | 5 | 5 | 94.666667 | 5.333333 |
Deferred Tiled (DICE) | 2 | 0 | 2 | 3 | 95.666667 | 4 |
Deferred Tiled (DICE) 2.5D Culling | 2.333333 | 0 | 2.333333 | 3.333333 | 94.333333 | 4 |
Deferred Tiled (DICE) 2.5D, AABB-based Culling | 4 | 1 | 5 | 5 | 93.333333 | 5 |
Deferred Tiled (Intel) | 2 | 0 | 2 | 3 | 97 | 4 |
Deferred Clustered | 3 | 0 | 3 | 4 | 97 | 5 |
Render Color
Pipeline | Render Color | |||||
---|---|---|---|---|---|---|
DRAM Read/Write Utilization | DRAM Activity | L1 Cache | L2 Cache | |||
Percentage utilization of DRAM reads | Percentage utilization of DRAM writes | Total DRAM Read/Write Utilization | Percentage of memory cycles that a read or write request to DRAM was active | Read/write utilization | Read/write utilization | |
Forward | 0 | 0 | 0 | 0.666667 | 82.666667 | 3.333333 |
Forward+ | 2 | 1 | 3 | 3.666667 | 89.333333 | 5 |
Forward+ 2.5D Culling | 2.333333 | 1 | 3.333333 | 4 | 92.666667 | 5.333333 |
Forward+ 2.5D, AABB-based Culling | 3 | 2 | 5 | 5 | 91 | 7 |
Forward Clustered | 2.666667 | 1 | 3.666667 | 4.666667 | 92.666667 | 6.333333 |
Tile / Cluster Assignment
Pipeline | Tile / Cluster Assignment | |||||
---|---|---|---|---|---|---|
DRAM Read/Write Utilization | DRAM Activity | L1 Cache | L2 Cache | |||
Percentage utilization of DRAM reads | Percentage utilization of DRAM writes | Total DRAM Read/Write Utilization | Percentage of memory cycles that a read or write request to DRAM was active | Read/write utilization | Read/write utilization | |
Forward+ | 1.333333 | 1 | 2.333333 | 3.333333 | 23.333333 | 12 |
Forward+ 2.5D Culling | 1.333333 | 1 | 2.333333 | 2.666667 | 26 | 11 |
Forward+ 2.5D, AABB-based Culling | 3.666667 | 2 | 5.666667 | 5.666667 | 41.666667 | 14 |
Forward Clustered | 0 | 4 | 4 | 5 | 43 | 18 |
Deferred Tiled | 0 | 1 | 1 | 2 | 24 | 12 |
Deferred Tiled 2.5D Culling | 0 | 1 | 1 | 2 | 26 | 11 |
Deferred Tiled 2.5D, AABB-based Culling | 1 | 2 | 3 | 3 | 43 | 14 |
Deferred Tiled (DICE) | 2 | 0 | 2 | 3 | 95.666667 | 4 |
Deferred Tiled (DICE) 2.5D Culling | 2.333333 | 0 | 2.333333 | 3.333333 | 94.333333 | 4 |
Deferred Tiled (DICE) 2.5D, AABB-based Culling | 4 | 1 | 5 | 5 | 93.333333 | 5 |
Deferred Tiled (Intel) | 2 | 0 | 2 | 3 | 97 | 4 |
Deferred Clustered | 0.333333 | 4 | 4.333333 | 5 | 42.333333 | 18 |
Shadow Maps
Lights Scalability
Frame Duration
Light Phase Duration
Render Color Duration
Tile / Cluster Assignment Duration
GBuffer: Fat Buffer vs Thin BufferKaplanyan10
Color Space EncodingKaplanyan10
Normal EncodingKaplanyan10
Frame Duration
Render Color Duration
Geometry Phase Duration
Lighting Phase Duration
False Positive Rate
Forward+
Name | Duration (ms) | Bandwidth | ||||||
---|---|---|---|---|---|---|---|---|
DRAM RW Utilization | DRAM Activity | L1 Cache RW Utilization | L2 Cache RW Utilization | |||||
% Utilization of DRAM Reads | % Utilization of DRAM Writes | Total DRAM RW Utilization | ||||||
Scene Render | Particle Update | 0.091776 0.095168 0.097408 =0.094784 |
0 0 1 =0.333333 |
0 | 0 0 1 =0.333333 |
0 0 2 =0.666667 |
0 | 0 0 1 =0.333333 |
Z PrePass | 0.083200 0.081600 0.082240 =0.082347 |
25 24 24 =24.333333 |
9 6 6 =7 |
34 30 30 =31.333333 |
34 31 31 =32 |
3 | 34 33 32 =33 |
|
Generate SSAO | 0.156384 0.155328 0.156512 =0.156075 |
8 9 10 =9 |
7 7 8 =7.333333 |
15 16 18 =16.333333 |
15 17 19 =17 |
47 47 48 =47.333333 |
32 34 33 =33 |
|
Fill Light Grid | 0.060064 0.059296 0.061600 =0.060320 |
1 1 5 =2.333333 |
3 3 2 =2.666667 |
4 4 7 =5 |
4 4 8 =5.333333 |
43 43 41 =42.333333 |
22 22 21 =21.666667 |
|
Main Render | 0.031232 0.030720 0.030816 =0.030923 |
0 | 0 | 0 | 0 | 0 | 13 | |
Render Shadow Map | 0.107104 0.090272 0.089888 =0.095755 |
19 21 21 =20.333333 |
8 9 9 =8.666667 |
27 30 30 =29 |
28 30 30 =29.333333 |
3 | 26 28 28 =27.333333 |
|
Render Color | 1.317600 1.318592 1.318752 =1.318315 |
3 | 2 | 5 | 5 | 91 92 83 =88.666667 |
7 7 6 =6.666667 |
|
Generate Camera Velocity | 0.059232 0.058464 0.059392 =0.059029 |
8 7 10 =8.333333 |
5 | 13 12 15 =13.333333 |
13 13 15 =13.666667 |
32 33 32 =32.333333 |
21 20 20 =20.333333 |
|
Temporal Resolve | 0.135456 0.134752 0.137024 =0.135744 |
42 39 45 =42 |
17 19 17 =17.666667 |
59 58 62 =59.666667 |
60 58 62 =60 |
68 68 67 =67.666667 |
69 67 74 =70 |
|
Particle Render | 0.085088 0.083296 0.085344 =0.084576 |
3 3 5 =3.666667 |
1 | 4 4 6 =4.666667 |
4 5 7 =5.333333 |
2 | 3 3 4 =3.333333 |
|
Motion Blur | 0.052416 0.055840 0.055616 =0.054624 |
44 50 49 =47.666667 |
15 14 14 =14.333333 |
59 64 63 =62 |
59 64 64 =62.333333 |
29 28 28 =28.333333 |
46 47 47 =46.666667 |
|
Total | 2.186400 2.170144 2.181408 |
9 8 9 =8.666667 |
4 | 13 12 13 =12.666667 |
13 13 14 =13.333333 |
66 65 65 =65.333333 |
16 15 16 =15.666667 |
Forward+ 2.5D Culling with AABB-based Culling
Name | Duration (ms) | Bandwidth | ||||||
---|---|---|---|---|---|---|---|---|
DRAM RW Utilization | DRAM Activity | L1 Cache RW Utilization | L2 Cache RW Utilization | |||||
% Utilization of DRAM Reads | % Utilization of DRAM Writes | Total DRAM RW Utilization | ||||||
Scene Render | Particle Update | 0.091584 0.092256 0.091296 =0.091712 |
0 3 0 =1 |
0 | 0 3 0 =1 |
0 4 0 =1.333333 |
0 | 0 2 1 =1 |
Z PrePass | 0.083328 0.081248 0.081120 =0.081898 |
25 24 24 =24.333333 |
9 10 10 =9.666667 |
34 | 34 35 34 =34.333333 |
3 | 34 33 32 =33 |
|
Generate SSAO | 0.161536 0.159520 0.154848 =0.158634 |
8 | 7 | 15 | 15 15 16 =15.333333 |
48 | 33 | |
Fill Light Grid | 0.059744 0.059776 0.059584 =0.059701 |
1 | 3 | 4 | 4 | 43 | 22 | |
Main Render | 0.030752 0.030720 0.030720 0.030731 |
0 | 0 | 0 | 0 | 0 | 13 | |
Render Shadow Map | 0.106656 0.089920 0.089760 =0.095445 |
19 21 21 =20.333333 |
7 9 9 =8.333333 |
26 30 30 =28.666667 |
27 30 30 =29 |
3 | 26 28 28 =28.666667 |
|
Render Color | 1.308896 1.311968 1.316896 =1.312587 |
3 3 4 =3.333333 |
2 | 5 5 6 =5.333333 |
5 5 6 =5.333333 |
92 92 84 89.333333 |
7 | |
Generate Camera Velocity | 0.058944 0.058464 0.058624 =0.058677 |
6 8 8 =7.333333 |
5 5 4 =4.666667 |
11 13 12 =12 |
12 13 13 =12.666667 |
32 32 33 =32.666667 |
19 21 22 =20.666667 |
|
Temporal Resolve | 0.135200 0.134208 0.135776 =0.135061 |
44 38 43 =41.666667 |
17 | 61 55 60 =58.666667 |
61 56 60 =59 |
67 68 67 =67.333333 |
69 66 72 =69 |
|
Particle Render | 0.083584 0.083776 0.083936 =0.083765 |
3 5 3 =3.666667 |
1 | 4 6 4 =4.666667 |
5 7 4 =5.333333 |
2 | 3 | |
Motion Blur | 0.052544 0.052224 0.052768 =0.052512 |
45 46 45 =45.333333 |
15 14 15 =14.666667 |
60 | 61 60 60 =60.333333 |
29 30 29 =29.333333 |
46 47 46 =46.333333 |
|
Total | 2.186400 2.161280 2.162144 =2.169941 |
8 | 4 | 12 | 12 | 62 65 62 =63 |
15 |
Forward Clustered
Name | Duration (ms) | Bandwidth | ||||||
---|---|---|---|---|---|---|---|---|
DRAM RW Utilization | DRAM Activity | L1 Cache RW Utilization | L2 Cache RW Utilization | |||||
% Utilization of DRAM Reads | % Utilization of DRAM Writes | Total DRAM RW Utilization | ||||||
Scene Render | Particle Update | 0.094080 0.098720 0.090016 =0.094272 |
0 | 0 | 0 | 0 | 0 | 0 |
Z PrePass | 0.083136 0.080256 0.081248 =0.081546 |
25 24 24 =24.333333 |
6 6 10 =7.333333 |
31 30 34 =31.666667 |
32 31 35 =32.666667 |
3 | 34 33 33 =33.333333 |
|
Generate SSAO | 0.158592 0.155296 0.167648 =0.160512 |
7 8 8 =7.666667 |
7 8 7 =7.333333 |
14 16 15 =15 |
14 17 15 =15.333333 |
47 | 31 31 32 =31.333333 |
|
Fill Light Grid | 0.454208 0.450720 0.454272 =0.453067 |
0 | 1 | 1 | 1 | 38 39 38 =38.333333 |
21 | |
Main Render | 0.031392 0.030752 0.031168 =0.031104 |
3 0 0 =1 |
3 0 0 =1 |
6 0 0 =2 |
7 0 0 =2.333333 |
0 | 17 15 15 =15.666667 |
|
Render Shadow Map | 0.106048 0.089184 0.089056 =0.094763 |
19 21 21 =20.333333 |
8 9 9 =8.666667 |
27 30 30 =29 |
28 30 31 =29.666667 |
3 | 26 28 28 =27.333333 |
|
Render Color | 2.018272 2.024256 2.027040 =2.023189 |
2 | 1 | 3 | 3 | 89 88 91 =89.333333 |
5 | |
Generate Camera Velocity | 0.059264 0.058496 0.059264 =0.059008 |
6 10 12 =9.333333 |
5 4 4 =4.333333 |
11 14 16 =13.666667 |
12 15 16 =14.333333 |
32 | 18 21 23 =20.666667 |
|
Temporal Resolve | 0.136320 0.134176 0.136032 =0.135509 |
43 37 43 =41 |
16 17 16 =16.333333 |
59 54 59 =57.333333 |
60 55 60 =58.333333 |
67 68 67 =67.333333 |
69 65 74 =69.333333 |
|
Particle Render | 0.083680 0.084256 0.082656 =0.083531 |
3 5 3 =3.666667 |
1 2 1 =1.333333 |
4 7 4 =5 |
5 7 5 =6.333333 |
2 | 3 | |
Motion Blur | 0.063776 0.054368 0.054528 =0.057557 |
45 55 45 =48.333333 |
20 17 20 =19 |
65 72 65 =67.333333 |
66 72 66 =68 |
29 26 29 =28 |
40 42 41 =41 |
|
Total | 3.295520 3.267296 3.279776 =3.280864 |
5 5 6 =5.333333 |
2 3 3 =2.666667 |
7 8 9 =8 |
8 9 9 =8.666667 |
70 67 68 =68.333333 |
13 |
Deferred Tiled
Name | Duration (ms) | Bandwidth | ||||||
---|---|---|---|---|---|---|---|---|
DRAM RW Utilization | DRAM Activity | L1 Cache RW Utilization | L2 Cache RW Utilization | |||||
% Utilization of DRAM Reads | % Utilization of DRAM Writes | Total DRAM RW Utilization | ||||||
Scene Render | Particle Update | 0.094080 0.098720 0.090016 =0.094272 |
0 | 0 | 0 | 0 | 0 | 0 |
Z PrePass | 0.083136 0.080256 0.081248 =0.081546 |
25 24 24 =24.333333 |
6 6 10 =7.333333 |
31 30 34 =31.666667 |
32 31 35 =32.666667 |
3 | 34 33 33 =33.333333 |
|
Generate SSAO | 0.158592 0.155296 0.167648 =0.160512 |
7 8 8 =7.666667 |
7 8 7 =7.333333 |
14 16 15 =15 |
14 17 15 =15.333333 |
47 | 31 31 32 =31.333333 |
|
Fill Light Grid | 0.454208 0.450720 0.454272 =0.453067 |
0 | 1 | 1 | 1 | 38 39 38 =38.333333 |
21 | |
Main Render | 0.031392 0.030752 0.031168 =0.031104 |
3 0 0 =1 |
3 0 0 =1 |
6 0 0 =2 |
7 0 0 =2.333333 |
0 | 17 15 15 =15.666667 |
|
Render Shadow Map | 0.106048 0.089184 0.089056 =0.094763 |
19 21 21 =20.333333 |
8 9 9 =8.666667 |
27 30 30 =29 |
28 30 31 =29.666667 |
3 | 26 28 28 =27.333333 |
|
Render Color | 2.018272 2.024256 2.027040 =2.023189 |
2 | 1 | 3 | 3 | 89 88 91 =89.333333 |
5 | |
Generate Camera Velocity | 0.059264 0.058496 0.059264 =0.059008 |
6 10 12 =9.333333 |
5 4 4 =4.333333 |
11 14 16 =13.666667 |
12 15 16 =14.333333 |
32 | 18 21 23 =20.666667 |
|
Temporal Resolve | 0.136320 0.134176 0.136032 =0.135509 |
43 37 43 =41 |
16 17 16 =16.333333 |
59 54 59 =57.333333 |
60 55 60 =58.333333 |
67 68 67 =67.333333 |
69 65 74 =69.333333 |
|
Particle Render | 0.083680 0.084256 0.082656 =0.083531 |
3 5 3 =3.666667 |
1 2 1 =1.333333 |
4 7 4 =5 |
5 7 5 =6.333333 |
2 | 3 | |
Motion Blur | 0.063776 0.054368 0.054528 =0.057557 |
45 55 45 =48.333333 |
20 17 20 =19 |
65 72 65 =67.333333 |
66 72 66 =68 |
29 26 29 =28 |
40 42 41 =41 |
|
Total | 2.051136 | 11 | 5 | 16 | 16 | 56 | 18 |
Optimization
- Locate the bottleneck of the pipeline
- Optimize that stage
Pipelines to optimize:
- Forward+
- Forward Clustered
- Deferred Tiled
- Deferred Clustered
- Deferred Thin G-Buffer Tiled
- Deferred Thin G-Buffer Clustered
Locating the Bottleneck
- Finding bottlenecks
- Set up several tests where each test decreases the amount of work a particular stage performs
- If one of these test causes the frames per second the increase, the bottleneck stage has been found.
- Set up several tests where each test decreases the amount of work a particular stage performs
Multiplatform GPU High-Level Optimization GuidelinesSousaKasyanSchulz12
- Generalize and always optimize for the worst case scenario
- Discover the biggest bottlenecks and address them by tackling the biggest time consumer. This means avoiding partial optimizations.
- Ex) Crysis 1: If the camera was static, then motion blur was disabled. If the camera was moving fast, then motion blur was enabled. This kind of bad optimization strategy resulted in big performance peaks and an inconsistent frame rate.
- Once done, repeat ad nauseam!
- Discover the biggest bottlenecks and address them by tackling the biggest time consumer. This means avoiding partial optimizations.
- Don’t repeat work or do unnecessary work. For example:
- Don’t down-sample full-screen color targets or depth targets multiple times for different postprocessing functinos
- Minimize the number of memory transfers, render target clears, and any redundant full-screen passes
- Such repeated or redundant work adds up very quickly
- Ex) A full-screen pass a 720 p costs ca. 0.25 ms on the Xbox 360 and ca. 0.4 ms on PS3. It is very easy to spend many milliseconds in a wasteful manner.
- Batch as much as possible in a single pass
- Take advantage of interframe coherency. Amortize costs across frames:
- This can provide a significant gain if done carefully, talking performance peaks and multi-GPU systems into account
- Distribute costs evenly
- Ex) If the HUD updates every nth frame, then every n + 1-th frame update some similar-costing render technique
- For screen-space ambient occlusion(SSAO) and the like, the cost can be distributed across frames
- In the end, the key words for the most cases are: “share, share, share.” Share as many computations and as much bandwidth as is reasonably possible in a single pass.
Multiplatform Optimization: Best PracticeSousaKasyanSchulz12
- Timers that showed where the GPU costs were located
- Shadows, lighting, post processes
- Visualization tools
- Lighting, scene overdraw visualization
References
Deferred Shading with Multiple Render Targets. Nicolas Thibieroz, PowerVR Technologies / AMD. ShaderX2.
Deferred Shading in Tabula Rasa. Rusty Koonce, NCSoft Corporation / Facebook. GPU Gems 3.
CryENGINE 3: Reaching the Speed of Light. Anton Kaplanyan, Crytek / Intel Corporation. SIGGRAPH 2010: Advances in Real-Time Rendering in Games Course.
Deferred Rendering for Current and Future Rendering Pipelines. Andrew Lauritzen, Intel Corporation. SIGGRAPH 2010: Beyond Programmable Shader Course.
Rendering Tech of Space Marine. Pope Kim, Relic Entertainment / POCU. Daniel Barrero, Relic Entertainment. KGC 2011.
Tiled Shading. Ola Olsson, Chalmers University of Technology / Epic Games. Ulf Assarsson, Chalmers University of Technology. Journal of Graphics, GPU, and Game Tools.
CryENGINE 3: Three Years of Work in Review. Tiago Sousa, Crytek / id Software. Nickolay Kasyan, Crytek / AMD. Nicolas Schulz, Crytek. GPU Pro 3.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
x = ["16/8/8", "32/24/8", "64/56/8", "128/56/8"]
forward_plus_y = [1.293643, 1.402208, 1.691659, 1.991093]
forward_plus_culling_y = [1.253589, 1.318080, 1.516608, 1.718923]
forward_clustered_y = [2.174741, 2.318197, 2.950581, 3.268107]
deferred_tiled_y = [1.370539, 1.453547, 1.709749, 1.972149]
deferred_tiled_culling_y = [1.321376, 1.372097, 1.554155, 1.718635]
deferred_tiled_dice_y = [1.579851, 1.630304, 1.849120, 2.059509]
deferred_tiled_dice_culling_y = [1.367307, 1.450763, 1.607851, 1.739381]
deferred_clustered = [1.369045, 2.354485, 2.935424, 3.519659]
plt.plot(x, forward_plus_y, label="Forward+")
plt.plot(x, forward_plus_culling_y, label="Forward+ 2.5D, AABB Culling")
plt.plot(x, forward_clustered_y, label="Forward Clustered")
plt.plot(x, deferred_tiled_y, label="Deferred Tiled")
plt.plot(x, deferred_tiled_culling_y, label="Deferred Tiled 2.5D, AABB Culling")
plt.plot(x, deferred_tiled_dice_y, label="Deferred Tiled DICE")
plt.plot(x, deferred_tiled_dice_culling_y, label="Deferred Tiled DICE 2.5D, AABB Culling")
plt.plot(x, deferred_clustered, label="Deferred Clustered")
plt.xlabel("Number of Lights (Point/Cone/Cone Shadowed)")
plt.ylabel("Scene Render Duration (ms)")
plt.legend()
plt.show()