Game Engine RHI System Analysis Series 1: Instance (2024.11.25)
Overview
In order to use a Graphics API, once has to initialize something called an Instance. In Direct3D, the Microsoft DirectX Graphics Infrastructure or DXGI is the instance, and in Vulkan, VkInstance
is the instance.
The primary goal of DXGI is to manage low-level tasks that can be independent of the DirectX graphics runtime. DXGI provides a common framework for future graphics components. DXGI’s purpose is to communicate with the kernel mode driver and the system hardware, as shown in the following diagram.
There is no global state in Vulkan and all per-application state is stored in a VkInstance
object. Creating a VkInstance
object initializes the Vulkan library and allows the application to pass information about itself to the implementation.
Anki
Anki does not keep the IDXGIFactory
in memory. It is created via CreateDXGIFactory
API call when needed.
When initializing their graphics manager, Anki creates an IDXGIFactory6
by creating a IDXGIFactory2
from the CreateDXGIFactory2(UINT, REFIID, void**)
API call, then querying IDXGIFactory6
from the created IDXGIFactory2
instance.
If GPU validation is required, then DXGI_CREATE_FACTORY_DEBUG
flag is set when creating an instance.
Anki then queries the physical device(IDXGIAdapter
s).
When creating a swap chain, an IDXGIFactory2
is created by the CreateDXGIFactory2(UINT, REFIID, void**)
API call. The created instance is used to create a swap chain by the IDXGIFactory2::CreateSwapChainForHwnd(IUnknown*, HWND, const DXGI_SWAP_CHAIN_DESC1*, const DXGI_SWAP_CHAIN_FULLSCREEN_DESC*, IDXGIOutput*, IDXGISwapChain1)
. Anki does not support fullscreen transitions, so the instance is used to call IDXGIFactory::MakeWindowAssociation(HWND, UINT)
.
All these initializations happens when the graphics manager is initialized.
Just like D3D12, VkInstance
is initialized in the graphics manager. Unlike IDXGIFactory
, VkInstance
is kept by the manager for future uses.
Instance creation in Vulkan is more complex compared to DirectX 12. In order to initialize an instance in Vulkan, one has to provide an application information VkApplicationInfo
, an array of instance layers to enable, validation features VkValidationFeaturesEXT
, and an array of extensions to enable.
Anki uses VK_LAYER_KHRONOS_validation
instance layer if GPU validation is enabled. User can provide instance layers via command-line argument.
Anki supports VK_VALIDATION_FEATURE_ENABLE_DEBUG_PRINTF_EXT
if enabled. If GPU validation is enabled, VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT
is added to enabled validation features.
If user is using headless surface, VK_EXT_headless_surface
instance extension is used. If the user is using Linux OS, VK_KHR_wayland_surface extension is used, VK_KHR_win32_surface
for Windows, VK_KHR_android_surface for Android. To support swap chains, VK_KHR_surface extension is used. If GPU validation is enabled, VK_EXT_debug_utils extension is used.
After creating the instance, Anki uses Volk to load all required Vulkan entrypoints using volkLoadInstance(VkInstance)
, sets debug callbacks with vkCreateDebugUtilsMessengerEXT(VkInstance, const VkDebugUtilsMessengerCreateInfoEXT*, const VkAllocationCallbacks*, VkDebugUtilsMessengerEXT*)
, and creates the physical device with vkEnumeratePhysicalDevices(VkInstance, uint32_t*, VkPhysicalDevice*)
.
When instance initialization is over, instance is used to create the surface. Anki supports surface creation via SDL, for Android, and for headless case.
Instance is later used when DLSS needs to be initialized. This is initialized when renderer initializes the renderer objects. One of the renderer objects, TemporalUpscaler
uses a GrUpscaler
that which can use DLSS.
Both graphics manager GrManager
and renderer Renderer
is initialized when the application App
is initialized.
BGFX
- D3D12
- Open
WinPixEventRuntime.dll
- Get symbols to
PIXEventsThreadInfo
and setPIXEventsReplaceBlock
to zero - Load RenderDoc
- Skip RenderDoc if IntelGPA is running (check
shimloader32.dll
orshimloader64.dll
) - Check if module
renderdoc.dll
is already injected, open the dll if not injected - Get symbols to
RENDERDOC_GetAPI
and initialize RenderDoc
- Skip RenderDoc if IntelGPA is running (check
- Initialize uniforms and resolution to zero
- Open
kernel32.dll
- Get symbol to
CreateEventExA
- Get symbol to
- Open
nvapi.dll
/nvapi64.dll
- Get symbol to
nvapi_QueryInterface
- Initialize NVAPI
- Enumeration physical GPUs, and set the first GPU as the main NVIDIA GPU
- Get symbol to
- Open
d3d12.dll
(libd3d12.so
in Linux)- Find crucial symbols in D3D12 (
D3D12CreateDevice
,D3D12GetDebugInterface
,D3D12SerializeRootSignature
, etc.)
- Find crucial symbols in D3D12 (
- Open
dxgi.dll
(dxgi.so
in Linux)- Open
dxgidebug.dll
- Get symbols to
DXGIGetDebugInterface
,DXGIGetDebugInterface1
- Get symbols to
- Get symbols to
CreateDXGIFactory1
/CreateDXGIFactory
- Create a factory
- Enumerate adapters (GPUs)
- Enumerate outputs (monitors)
- Check features (HDR10, transparent back buffer, etc.)
- Set the first adapter as the main adapter
- Set the first output as the main output
- Enumerate outputs (monitors)
- Enumerate adapters (GPUs)
- Open
- Initialize D3D12 Debug Layer
- Create the D3D12 Device with the highest feature level
- Query the most recent DXGI Device interface
- Query the most recent D3D12 Device interface
- Shutdown NVAPI if current chosen vendor is not NVIDIA
- For each nodes in the D3D12 Device, check their architecture, and keep the first node’s architecture
- Check D3D12 feature options
- Get heap properties of Custom, Default, Upload, Readback from the D3D12 Device
- Create a Direct command queue and its fence
- Create command allocators for direct command lists
- Create a direct command list and close it
- Create command allocators for direct command lists
- Initialize swap chain
- Check MSAA support
- Check tearing support
- Create swap chain
- Query the most recent swap chain interface
- Check the color space support
- Set the color space according to the swap chain’s format
- Check the display specs of the output containing the swap chain
- Set the HDR10 meta data
- Initialize D3D12 Info Queues
- Create RTV, DSV heap
- Create scratch buffers (size = max draw calls * 1024, descriptors = max textures + max shaders + max draw calls)
- Create CBV/SRV/UAV heap
- Create committed resource with upload heap properties (custom heap type)
- Map to CPU data
- Create sampler allocator
- Create sampler heap
- Create descriptor ranges with each range having a single descriptor type
- sampler (N)
- SRV (N)
- CBV (1)
- UAV (N)
- Create root parameters
- Descriptor table: sampler range
- Descriptor table: SRV range
- Root CBV: register space=0, shader register=0
- Descriptor table: UAV range
- Create a root signature
- Check direct access support (UMA)
- Check resource supports
- Check limits
- For every formats,
- Check if format is supported for various types (texture2d, texture3d, etc.)
- If the format could be read as texture image, check for UAV RW suports
- Check if the format’s corresponding SRGB format is supported for various SRGB types (texture2d, texture3d, etc.)
- Create RTVs for each swap chain buffers
- Allocate a command list from command allocator by resetting the command list
- Create a depth-stencil buffer from a heap of default heap properties
- Create a DSV
- Set a resource barrier of depth-stencil buffer from common state to depth write state
- Set indirect arguments
- VBV 0
- VBV 1
- VBV 2
- VBV 3
- VBV 4
- CBV 2
- DRAW 0
- Create a draw command signature
- Set indirect arguments
- VBV 0
- VBV 1
- VBV 2
- VBV 3
- VBV 4
- IBV 0
- CBV 2
- DRAW INDEXED 0
- Create a draw indexed command signature
- Create commands for each draw type per batch
- Create indirects (size=max draw per batch * command size)
- Initialize GPU timer
- Create query heap
- Create read back resource
- Get timestamp frequency
- Map read back buffer to query result
- Reset results and control
- Initialize occlusion query
- Create query heap
- Create read back resource
- Map read back buffer to result
- Create command signatures for Dispatch, Draw, and Draw Indexed
- If NVAPI is initialized,
- Kick the command queue
- Close the command list
- Execute command lists
- Create fence event
- Let command queue signal fence value
- Set fence event on completion
- Commit a control (+1 write +1 current)
- Finish the command queue
- If there exist an available control,
- Consume the command queue
- Wait for the fence event
- Close the fence event
- Set the completed fence value
- Check if GPU has passed the fence via command queue wait
- Release read resources
- Consume a control (+1 read)
- Consume the command queue
- If there exist an available control,
- Initialize Aftermath
- Open
GFSDK_Aftermath_Lib.x86.dll
/GFSDK_Aftermath_Lib.x64.dll
- Get necessary symbols
- Initialize Aftermath
- Open
- Kick the command queue
- Open
- Vulkan
- Just like D3D12, load RenderDoc
- Open
vulkan-1.dll
/libvulkan.so
(Android) /libMoltenVK.dylib
(OSX) /libvulkan.so.1
- Import crucial vulkan functions
- Set layers/extensions needed
VK_LAYER_LUNARG_standard_validation
(disabled ifVK_LAYER_KHRONOS_validation
is supported)VK_LAYER_KHRONOS_validation
VK_EXT_debug_report
VK_EXT_shader_viewport_index_layer
VK_EXT_conservative_rasterization
VK_KHR_draw_indirect_count
VK_EXT_custom_border_color
VK_EXT_debug_utils
VK_EXT_line_rasterization
VK_EXT_memory_budget
VK_KHR_get_physical_device_properties2
VK_KHR_win32_surface
/VK_KHR_android_surface
/VK_KHR_wayland_surface
/VK_KHR_xlib_surface
/VK_KHR_xcb_surface
/VK_MVK_MACOS_SURFACE_EXTENSION_NAME
/VK_NN_VI_SURFACE_EXTENSION_NAME
- Create Vulkan instance
- Import Vulkan instance API calls
- Initialize Debug Layer if extension
VK_EXT_debug_report
is supported - Enumerate physical devices
- Check availabe extensions
- Set the first one to be the main physical device
- Get and check physical device features/limits (custom border color, line rasterization, etc.)
- Update MSAA support
- For every texture formats,
- Check their MSAA support per image type/usage bit, etc.
- Get physical device’s memory properties
- Get query family properties, and get the global queue family that supports graphics and compute
- Create a Vulkan device with a global queue from the global queue family
- Import Vulkan logical device API calls
- Get the global queue
- For number of required frame buffers,
- Create a command pool
- Allocate a command buffer from the pool
- Create a fence
- Allocate a command buffer
- Wait for fences
- Reset command pool
- Begin command buffer
- Create back buffers
- Create swap chain
- Create a surface and check if the physical device supports the surface
- Create swap chain with appropriate format, present mode, etc.
- Create attachments
- For each swap chain images, create image views
- Create present/render semaphores per back buffers
- Create depth-stencil attachment
- Create image
- Allocate device memory according to the image memory requirements
- Bind image to the device memory
- Set image memory barrier from undefined to depth-stencil attachment optimal
- Create depth-stencil image view
- Create frame buffers
- Create a render pass that uses back buffers
- Create Vulkan frame buffers using created render passes
- Create swap chain
- Create descriptor pools
- Sampled image: max descriptor sets * max texture samplers
- Sampler: max descriptor sets * max texture samplers
- Uniform buffer dynamic: max descriptor sets * 2
- Storage buffer: max descriptor sets * max texture samplers
- Storage image: max descriptor sets * max texture samplers
- Create pipeline cache
- For each frame buffers, create scratch buffers
- Create uniform buffer
- Allocate device memory
- Bind buffer to device memory
- Map device memory to CPU data
- For each frame buffers, create scratch staging buffers
- Create staging buffer (transfer dst/src bit set)
- Initialize GPU timer
- Create query pool
- Record query pool reset command to command buffer
- Create read back host buffer
- Map read back memory to query result
- Reset results and control
- Initialize occlusion query
- Create query pool
- Record query pool reset command to command buffer
- Create read back host buffer
- Map read back memory to query result
- Reset control
BGFX has a Dxgi
struct where it manages the DXGI instances such as the IDXGIFactory
. When running the application, the engine initializes the instance when available during the main loop(Context::renderFrame
). Context has a renderer context, which has the instance.
Enabled layers:
VK_LAYER_KHRONOS_validation
Enabled extensions:
VK_KHR_surface
VK_EXT_debug_report
VK_EXT_debug_utils
VK_KHR_get_physical_device_properties2
VK_KHR_win32_surface
@startuml Anki DXGI
class MakeSingletonPtr
class GrManager {
+init(GrManagerInitInfo&): Error
}
MakeSingletonPtr <|-- GrManager
class GrManagerImpl {
-m_crntSwapchain: MicroSwapchainPtr
+initInternal(const GrManagerInitInfo&): Error
}
GrManager <|-- GrManagerImpl
class MicroSwapchain {
+MicroSwapchain()
-initInternal(): Error
}
GrManagerImpl *-- MicroSwapchain
class App {
+init(): Error
-initInternal(): Error
}
@enduml
@startuml Anki VkInstance
class MakeSingletonPtr
class GrManager {
+init(GrManagerInitInfo&): Error
+newGrUpscaler(const GrUpscalerInitInfo&): GrUpscalerPtr
}
MakeSingletonPtr <|-- GrManager
class GrManagerImpl {
-m_instance: VkInstance
+getInstance(): VkInstance
+initInternal(const GrManagerInitInfo&): Error
+initInstance(): Error
+initSurface(): Error
}
GrManager <|-- GrManagerImpl
class GrObject
class GrUpscaler
GrObject <|-- GrUpscaler
class GrUpscalerImpl {
+initInternal(const GrUpscalerInitInfo&): Error
-{static} newInstance(const GrUpscalerInitInfo&): GrUpscaler*
-initDlss(const GrUpscalerInitInfo&): Error
}
GrUpscaler <|-- GrUpscalerImpl
class RendererObject
class TemporalUpscaler {
-m_grUpscaler: GrUpscalerPtr
+init(): Error
}
RendererObject <|-- TemporalUpscaler
TemporalUpscaler *-- GrUpscaler
class Renderer {
-m_temporalUpscaler: TemporalUpscaler
+init(const RendererInitInfo&): Error
-initInternal(const RendererInitInfo&): Error
}
class MakeSingleton
MakeSingleton <|-- Renderer
Renderer *-- TemporalUpscaler
class App {
+init(): Error
-initInternal(): Error
}
@enduml
@startuml BGFX RHI Instance
struct RendererContextI
struct Dxgi {
+m_factory: FactoryI*
+init(_caps: Caps&): bool
}
struct RendererContextD3D12 {
+m_dxgi: Dxgi
+init(_init: const Init&)
}
RendererContextI <|-- RendererContextD3D12
RendererContextD3D12 *-- Dxgi
struct RendererContextVK {
+m_instance: VkInstance
+init(_init: const Init&)
}
RendererContextI <|-- RendererContextVK
struct Context {
+m_renderCtx: RendererContextI*
+renderFrame(int32_t): RenderFrame::Enum
+rendererExecCommands(CommandBuffer&)
}
Context *-- RendererContextI
@enduml
Diligent Engine
Diligent engine does not keep track of the DXGI adapter / factory. The adapter can be retrieved from the D3D12 device by the LUID, and the factory can be created any time.
- Load
d3d12.dll
- Find Adapters
- Create a DXGI Factory
CreateDXGIFactory1
- Enumerate adapters and check if adapter can create a D3D12 device using minimum feature level (
IDXGIFactory::EnumAdapters
) - For each enumerated adapter,
- Create the D3D12 Device that supports the highest feature level
- Check supported features
- If tiled resource tier is greater or equal to 1, load NVAPI (if NVPAI is enabled)
- Check outputs
- For each enumerated adapters,
- Get the best adapter (discrete > integrated, more memory)
- Get display modes (
IDXGIOutput::GetDisplayModeList
) - Create debug layer
- Create DXGI factory and get the predetermined adapter, and create a D3D12 device that supports the highest feature level
- Create the info queue
- Create a direct command queue as the default immediate context, and its fence
- Create a diligent engine’s D3D12 render device
- Create query managers
- Create shader compilation thread pool
- For each immediate contexts,
- Create a diligent engine’s D3D12 immediate context
- Create a swap chain
- Create a FrameLatencyWaitableObject from the swap chain
IDXGISwapChain2::GetFrameLatencyWaitableObject
- Create a texture for each back buffers, and create their RTVs
- Create a depth buffer texture, and it DSV
- Create a DXGI Factory
Vulkan:
- Find Adapters
- Create Vulkan instance
- Initialize Volk
- Load
vulkan-1.dll
- Use volk to load Vulkan functions
- Load
- Add instance extensions
VK_KHR_surface
VK_KHR_win32_surface
/VK_KHR_android_surface
/VK_KHR_wayland_surface
/VK_KHR_xlib_surface
/VK_KHR_xcb_surface
/VK_EXT_metal_surface
VK_KHR_get_physical_device_properties2
- Create Vulkan instance
- Load instance-related function using Volk
- Set up debug layer
- Enumerate physical devices
- For each devices,
- Get properties, features, memory properties, queue family properties
- Check supported extensions and add features to query accordingly
VK_KHR_shader_float16_int8
,VK_KHR_storage_buffer_storage_class
VK_KHR_16bit_storage
VK_KHR_8bit_storage
VK_EXT_mesh_shader
VK_KHR_acceleration_structure
VK_KHR_ray_tracing_pipeline
VK_KHR_ray_query
VK_KHR_buffer_device_address
VK_EXT_descriptor_indexing
VK_KHR_spirv_1_4
VK_KHR_portability_subset
VK_EXT_vertex_attribute_divisor
VK_KHR_timeline_semaphore
VK_KHR_multiview
VK_KHR_create_renderpass2
VK_KHR_fragment_shading_rate
VK_EXT_fragment_density_map
VK_EXT_host_query_reset
VK_KHR_draw_indirect_count
VK_KHR_maintenance3
VK_EXT_multi_draw
- Check physical device info
- For each enumerated adapters,
- Get the best adapter (discrete > integrated, more memory)
- Initialize Volk
- Create Vulkan instance
- Create device and contexts
- Use device extensions
VK_KHR_swapchain
VK_KHR_maintenance1
VK_EXT_mesh_shader
VK_KHR_shader_float16_int8
,VK_KHR_storage_buffer_storage_class
VK_KHR_16bit_storage
VK_KHR_8bit_storage
VK_KHR_acceleration_structure
VK_KHR_ray_tracing_pipeline
VK_KHR_ray_query
VK_KHR_buffer_device_address
VK_EXT_descriptor_indexing
VK_KHR_spirv_1_4
VK_KHR_portability_subset
VK_EXT_vertex_attribute_divisor
VK_KHR_timeline_semaphore
VK_KHR_multiview
VK_KHR_create_renderpass2
VK_KHR_fragment_shading_rate
VK_EXT_fragment_density_map
VK_EXT_host_query_reset
VK_KHR_draw_indirect_count
VK_KHR_maintenance3
VK_KHR_maintenance2
VK_EXT_multi_draw
- Enable device features
- Enumerate a queue that supports both graphics and compute queue flags
- Create a logical device
- Use volk to load related functions
- Enable features
- Create a graphics context queue (from the Vulkan queue we enumerated)
- Create a Vulkan render device
- Create transient command pool managers and query manager per command queues
- Create shader compilation thread pool
- Create a Vulkan device context
- Allocate a command buffer and begin
- Reset stale queries
- Create a dummy vertex buffer
- Create a AS compacted size query pool
- Set created device context as the immediate context of the render device
- Use device extensions
- Create swap chain
- Create a surface
- Check if the current queue can present to the given surface
- Check which supported formats support the current color format
- Check surface capabilities and present modes
- Set present modes based on VSync support
- If VSync enabled
- FIFO relaxed
- FIFO
- Else Vsync disabled
- Mailbox
- Immediate
- FIFO
- If VSync enabled
- Set the number of back buffers based on the hardware limits and the desire back buffer count
- Create Vulkan swap chain
- For each back buffers,
- Create some semaphores for
- Image acquisition
- Draw completion
- Create a image acquition fence
- Create some semaphores for
- For each back buffers
- Create a texture
- Create RTV
- Create a depth buffer texture
- Create default DSV
Filament
- Create instance
- Add
VK_LAYER_KHRONOS_validation
if validation is enabled - Create Vulkan instance (
vkCreateInstance
) - Load functions (using BlueVK)
- Add
- Select physical device
- Enumerate devices (
vkEnumeratePhysicalDevices
) - For each devices,
- Get their properties and check versions, graphics bit support (
vkGetPhysicalDeviceProperties
,vkGetPhysicalDeviceQueueFamilyProperties
) - Enumerate extension properties (
vkEnumerateDeviceExtensionProperties
) - Check for swap chain extension support
- Get their properties and check versions, graphics bit support (
- Sort devices by device type(discrete/integrated) and pick the best one (discrete > integrated > cpu > virtual gpu > other)
- Enumerate devices (
- Print device info (
vkGetPhysicalDeviceProperties2
,vkGetPhysicalDeviceProperties
) - Check physical device properties, features, and memory properties (
vkGetPhysicalDeviceFeatures2
,vkGetPhysicalDeviceMemoryProperties
) - Select appropriate queues
- Create logical device (
vkCreateDevice
) - Get queue (
vkGetDeviceQueue
) - Create Vulkan driver
- Create commands
- Create command pool (
vkCreateCommandPool
)- Create command buffers, and for each command buffers,
- Allocate command buffer from the pool (
vkAllocateCommandBuffers
) - Create submission semaphore (
vkCreateSemaphore
) - Create fence (
vkCreateFence
)
- Allocate command buffer from the pool (
- Create command buffers, and for each command buffers,
- Create command pool (
- Create commands