Pipeline State Cache Studies (2024.11.22)
Introduction
The most efficient way to render using modern GPU is to utilize it’s hardware’s architecture. Modern GPUs uses the concept of pipelines to maximize performance. A pipeline consists of various information GPU requires to render, which includes shader cache blobs, root signatures/layouts, and additional data per stages in the pipeline. An object actually contains these info is called PSO(Pipeline State Object) in DirectX’s jargon. In order to create this object, one has to ask the graphics API to create one. The API then goes to the graphics driver to request compilation of the given shader cache blob. I know that that sounds rather strange. Shader cache blobs are supposted be compiled data, no? Well if we are talking about general purpose hosts such as PC or Android which has no idea what GPU the system uses, how can the compiler know to which instruction set it should compile? Due to this, most shaders are compiled into intermediate languages(IL) such as DXIL(DirectX IL) or SPIR-V(The Standard IR for Parallel Compute and Graphics V(roman numeral for five)). The moment these intermediate languages compiled into actual executable instruction sets are when PSOs are created.
As everyone knows, compilation can take a chunk of runtime budget. Modern AAA games uses hundreds of shaders which are combined into tens of thousands of PSOs. Due to the rich environment these games have, the PSOs should be loaded on runtime, and the compilation becomes the major hotspot. Deploying the application with IL shader caches might make sense, but making the users keep the IL shader caches which are to be compiled into the actual instruction sets is a inefficient process. Caching the compiled PSOs into the secondary memory to be used later would definitely help improving overall performance. Users rarely alter their environment, and even if they do so, the application only has to update the first time of its launching.
Now I will give my studies on how to implement this system based on the Microsoft’s sample and other game engines such as Unreal.
Previous Works
DirectX
Microsoft Sample
There are two methods to implement pipeline caching. You can either cache the pipelines into a single pipeline library, or cache them per pipeline. The former requires you to initialize a pipeline library instance, and the latter requires you to call ID3D12PipelineState::GetCachedBlob(ID3DBlob**)
.
Pipeline Library Method
MemoryMappedPipelineLibrary::Init(ID3D12Device*, std::wstring)
MemoryMappedFile::Init(wstring, UINT)
- Creates an empty
pipelineLibrary.cache
file - Create a file mapping to
pipelineLibrary.cache
- Creates an empty
- Create a pipeline library based on its contents (
ID3D12Device1::CreatePipelineLibrary(const void*, SIZE_T, REFIID, void**)
)
- For each pipelines,
CompilePSO(CompilePSOThreadData*)
- Try to load the pipeline cache from pipeline library (should fail on initial launch) (
ID3D12PipelineLibrary::LoadGraphicsPipeline(LPCWSTR, const D3D12_GRAPHICS_PIPELINE_STATE_DESC*, REFIID, void**
) - If pipeline cache does not exist,
- Create the PSO
- Store the PSO (
ID3D12PipelineLibrary::StorePipeline(LPCWSTR, ID3D12PipelineState*
)
- Try to load the pipeline cache from pipeline library (should fail on initial launch) (
- On application termination
- Write pipeline cache to file
MemoryMappedPipelineLibrary::Destroy(bool)
- Write pipeline cache to file
Cached Blob Method
- For each pipelines,
MemoryMappedFile::Init(wstring, UINT)
CompilePSO(CompilePSOThreadData*)
- Try to get the cached blobs from the disk memory (should fail on initial launch) (
D3D12_CACHED_PIPELINE_STATE
) - If pipeline’s cache blob does not exist,
- Create the PSO
- Get the cached blob (
ID3D12PipelineState::GetCachedBlob(ID3DBlob**)
) - Write the cached blob into the memory
- Try to get the cached blobs from the disk memory (should fail on initial launch) (
- On application termination
- Unmap the views to the cached blobs (
UnmapViewOfFile(LPCVOID)
)
- Unmap the views to the cached blobs (
BGFX
BGFX only uses the cached blob method.
RendererContextD3D12::getPipelineState(ProgramHandle): ID3D12PipelineState*
- Try to get the PSO from main memory (
RendererContextD3D12::m_pipelineStateCache: StateCacheT<ID3D12PipelineState>
) - Fallback to disk memory for cached blobs if failed (
CallbackC99::cacheRead(uint64_t, void*, uint32_t): bool
)- Create PSO based on the retrieved cached blob
- If either memory has no cached blob, create PSO manually
- If PSO has been created, write it back to disk memory and main memory (
ID3D12PipelineState::GetCachedBlob(ID3DBlob**)
)
- Try to get the PSO from main memory (
LLGL
BGFX only uses the cached blob method.
D3D12GraphicsPSO::CreateNativePSO(D3D12Device&, const D3D12PipelineLayout&, const D3D12RenderPass*, const GraphicsPipelineDescriptor&, D3D12PipelineCache*)
/D3D12ComputePSO::CreateNativePSO(D3D12Device&, const D3D12_SHADER_BYTECODE&, D3D12PipelineCache*)
- Create PSO (use cached blob if given
D3D12PipelineCache*
is not null) D3D12PipelineState::SetNativeAndUpdateCache(ComPtr<ID3D12PipelineState>&&, D3D12PipelineCache*)
- If the given
D3D12PipelineCache*
is not null but an empty one, set its blob as the cached blob (ID3D12PipelineState::GetCachedBlob(ID3DBlob**)
)
- If the given
- Create PSO (use cached blob if given
O3DE
O3DE only uses the pipeline library method. According to O3DE, if RenderDoc or PIX is enabled, CreatePipelineLibrary API does not function properly.
LoadGraphicsPipeline
PipelineStateCache::CreateLibrary Shader::InitShader::CreateInternal ShaderSystem::Init
PipelineStateCache::AcquirePipelineState(PipelineLibraryHandle, const PipelineStateDescriptor&, const AZ::Name&): const PipelineState*
- Get pipeline state from read-only cache
- Fallback to thread-local cache
- Fallback to pipeline library creation
PipelineLibrary::Init(Device&, const PipelineLibraryDescriptor&): ResultCode
PipelineLibrary::InitInternal(RHI::Device&, const RHI::PipelineLibraryDescriptor&): RHI::ResultCode
- Get deserialized pipeline library cache data
- Create pipeline library
PipelineStateCache::CompilePipelineState(GlobalLibraryEntry&, ThreadLibraryEntry&, const PipelineStateDescriptor&, PipelineStateHash, const AZ::Name&): ConstPtr<PipelineState>
- Add PSO to pending cache
PipelineState::Init(Device&, const PipelineStateDescriptorForDraw/PipelineStateDescriptorForDispatch/PipelineStateDescriptorForRayTracing&, PipelineLibrary*): ResultCode
PipelineState::InitInternal(RHI::Device&, const RHI::PipelineStateDescriptorForDraw/PipelineStateDescriptorForDispatch/PipelineStateDescriptorForRayTracing&, RHI::PipelineLibrary*): RHI::ResultCode
PipelineLibrary::CreateGraphicsPipelineState(uint64_t, const D3D12_GRAPHICS_PIPELINE_STATE_DESC&): RHI::Ptr<ID3D12PipelineState>
- Load pipeline cache
LoadGraphicsPipeline/LoadComputePipeline
- Create PSO on fail
- Store pipeline
- Load pipeline cache
PUML
@startuml Microsoft DirectX 12 Sample
class MemoryMappedFile {
-m_mapFile: HANDLE
-m_file: HANDLE
-m_mapAddress: LPVOID
-m_filename: std::wstring
-m_currentFileSize: UINT
+Init(std::wstring, UINT)
+Destroy(bool)
+GrowMapping(UINT)
}
class MemoryMappedPipelineLibrary {
-m_pipelineLibrary: Microsoft::WRL::ComPtr<ID3D12PipelineLibrary>
+Init(ID3D12Device*, std::wstring): bool
+Destroy(bool)
}
MemoryMappedFile <|-- MemoryMappedPipelineLibrary
class MemoryMappedPSOCache {
+Init(std::wstring): bool
+Destroy(bool)
}
MemoryMappedFile <|-- MemoryMappedPSOCache
struct CompilePSOThreadData {
+pLibrary: PSOLibrary*
+pDevice: ID3D12Device*
+pRootSignature: ID3D12RootSignature*
+type: EffectPipelineType
+threadHandle: HANDLE
}
class PSOLibrary {
+{static} CompilePSO(CompilePSOThreadData*)
-m_pipelineStates: ComPtr<ID3D12PipelineState>[EffectPipelineTypeCount]
-m_diskCaches: MemoryMappedPSOCache[EffectPipelineTypeCount]
-m_pipelineLibrary: MemoryMappedPipelineLibrary
-m_workerThreads: CompilePSOThreadData[EffectPipelineTypeCount]
}
PSOLibrary "1" *-- "n" MemoryMappedPSOCache
PSOLibrary "1" *-- "1" MemoryMappedPipelineLibrary
PSOLibrary "1" *-- "n" CompilePSOThreadData
@enduml
@startuml O3DE
class "AZ::Dom::Object"
class "AZ::RHI::DeviceObject"
"AZ::Dom::Object" <|-- "AZ::RHI::DeviceObject"
class "AZ::RHI::PipelineLibrary" {
-{abstract}InitInternal(Device&, const PipelineLibraryDescriptor&): ResultCode
}
"AZ::RHI::DeviceObject" <|-- "AZ::RHI::PipelineLibrary"
class "AZ::DX12::PipelineLibrary" {
-m_serializedData: RHI::ConstPtr<RHI::PipelineLibraryData>
-m_library: RHI::Ptr<ID3D12PipelineLibraryX>
-m_pipelineStates: AZStd::unordered_map<AZStd::wstring, RHI::Ptr<ID3D12PipelineState>>
+CreateGraphicsPipelineState(uint64_t, const D3D12_GRAPHICS_PIPELINE_STATE_DESC&): RHI::Ptr<ID3D12PipelineState>
+CreateComputePipelineState(uint64_t, const D3D12_COMPUTE_PIPELINE_STATE_DESC&): RHI::Ptr<ID3D12PipelineState>
-InitInternal(RHI::Device&, const RHI::PipelineLibraryDescriptor&): RHI::ResultCode
}
"AZ::RHI::PipelineLibrary" <|-- "AZ::DX12::PipelineLibrary"
struct "AZ::RHI::PipelineStateCache::ThreadLibraryEntry" {
+m_threadLocalCache: PipelineStateSet
+m_library: Ptr<AZ::RHI::PipelineLibrary>
}
"AZ::RHI::PipelineStateCache::ThreadLibraryEntry" *-- "AZ::RHI::PipelineLibrary"
struct "AZ::RHI::PipelineStateCache::GlobalLibraryEntry" {
+m_pendingCache: PipelineStateSet
}
class "AZ::RHI::PipelineStateCache" {
-m_threadLibrarySet: ThreadLocalContext<ThreadLibrarySet>
-m_globalLibrarySet: GlobalLibrarySet
+AcquirePipelineState(PipelineLibraryHandle, const PipelineStateDescriptor&, const AZ::Name&): const PipelineState*
+Compact()
}
"AZ::RHI::PipelineStateCache" *-- "AZ::RHI::PipelineStateCache::GlobalLibraryEntry"
"AZ::RHI::PipelineStateCache" *-- "AZ::RHI::PipelineStateCache::ThreadLibraryEntry"
class "AZ::RHI:RHISystem" {
-m_pipelineStateCache: RHI::Ptr<RHI::PipelineStateCache>
}
"AZ::RHI:RHISystem" *-- "AZ::RHI::PipelineStateCache"
class "AZ::RPI::RPISystem" {
-m_rhiSystem: RHI::RHISystem
}
"AZ::RPI::RPISystem" *-- "AZ::RHI:RHISystem"
class "AZ::RPI::RPISystemComponent" {
-m_rpiSystem: RPISystem
}
"AZ::RPI::RPISystemComponent" *-- "AZ::RPI::RPISystem"
class "AZ::DX12::PipelineState" {
-m_pipelineState: RHI::Ptr<ID3D12PipelineState>
-InitInternal(RHI::Device&, const RHI::PipelineStateDescriptorForDraw&, RHI::PipelineLibrary*): RHI::ResultCode
-InitInternal(RHI::Device&, const RHI::PipelineStateDescriptorForDispatch&, RHI::PipelineLibrary*): RHI::ResultCode
}
class "AZ::RHI::PipelineState"
"AZ::RHI::PipelineState" <|-- "AZ::DX12::PipelineState"
"AZ::RHI::DeviceObject" <|-- "AZ::RHI::PipelineState"
@enduml