Frame Flow and Render Graph Design
The rendering pipeline in Himalaya is orchestrated through a frame-based render graph that automatically manages resource dependencies, image layout transitions, and synchronization barriers. This architecture separates the high-level frame flow definition from low-level Vulkan synchronization details, enabling passes to declare their resource usage while the graph computes optimal barrier insertion.
Sources: render_graph.h, render_graph.cpp
Render Graph Architecture
The RenderGraph class serves as the central coordinator for per-frame rendering operations. Unlike persistent scene graphs, the render graph is rebuilt every frame following a clear lifecycle: clear() → import resources → add_pass() → compile() → execute(). This design enables dynamic frame composition where passes can be conditionally included based on feature toggles, and resource configurations can adapt to runtime changes like resolution scaling or MSAA mode switches.
Sources: render_graph.h
Core Design Principles
The render graph operates on three fundamental principles that distinguish it from lower-level RHI abstractions. External resource ownership means the graph tracks but never creates GPU resources—images and buffers are imported via import_image() and import_buffer() with their handles remaining valid only for the current frame. Declarative dependency specification requires each pass to declare its resource accesses (read, write, or read-write) along with the pipeline stage context, enabling the graph to compute precise barrier requirements. Automatic synchronization is achieved during compile() which walks passes in registration order, tracking each resource's current layout and emitting VkImageMemoryBarrier2 structures only when layout changes or data hazards are detected.
Sources: render_graph.h, render_graph.cpp
Resource Usage Declaration
Passes declare their resource dependencies through RGResourceUsage structures that combine a resource identifier with access type and pipeline stage. The RGAccessType enum distinguishes read-only sampling, write-only output, and simultaneous read-write operations (used for depth testing with depth write). The RGStage enum maps to Vulkan pipeline stages and determines the VkImageLayout for barrier computation—ColorAttachment produces COLOR_ATTACHMENT_OPTIMAL, Fragment sampling produces SHADER_READ_ONLY_OPTIMAL, and Compute read-write operations use GENERAL layout.
Sources: render_graph.h, render_graph.cpp
Managed Resource System
Beyond imported external resources, the render graph provides a managed resource system for transient render targets that persist across frames but are recreated when configuration changes. Managed images are created once via create_managed_image() with either Relative sizing (fraction of reference resolution) or Absolute sizing (fixed pixels), then imported per-frame using use_managed_image().
Sources: render_graph.h
Temporal Resource Double-Buffering
Temporal effects like ambient occlusion reprojection require access to previous frame data. When temporal=true is specified during managed image creation, the graph allocates two backing images and automatically swaps them each frame during clear(). The get_history_image() method imports the previous frame's content with appropriate layout transitions, while is_history_valid() indicates whether history contains meaningful data (false on first frame or after resize).
Sources: render_graph.h, render_graph.cpp
Dynamic Resolution Adaptation
The set_reference_resolution() method enables automatic resource rebuilding when the output resolution changes. For Relative-sized managed images, the graph compares resolved dimensions before and after the reference change, destroying and recreating backing images only when their pixel dimensions actually differ. This mechanism supports window resizing and dynamic resolution scaling without manual pass coordination.
Sources: render_graph.cpp
Frame Compilation and Execution
The compile() phase transforms declarative pass definitions into executable barrier sequences. For each pass, the graph examines its resource declarations and compares required layout/stage/access against the resource's current state. A barrier is emitted when either the layout changes or a data hazard exists—defined as any write followed by a read (RAW), write followed by write (WAW), or read followed by write (WAR). Read-after-read (RAR) dependencies require no synchronization.
Sources: render_graph.cpp
Barrier Optimization Strategy
The compilation algorithm tracks per-resource state across the entire frame, ensuring that consecutive passes accessing the same resource in compatible states incur zero barrier overhead. For imported images with specified final_layout, the graph appends end-of-frame transitions to restore resources to their expected state for external consumption or cross-frame persistence.
Sources: render_graph.cpp
Execution with Debug Instrumentation
During execute(), each pass is wrapped in debug label regions with automatically generated distinct colors using golden-angle hue distribution. This enables clear visualization in RenderDoc and Nsight Graphics without manual pass instrumentation. Barriers are batched through VkDependencyInfo and submitted via vkCmdPipelineBarrier2 for synchronization2 efficiency.
Sources: render_graph.cpp, render_graph.cpp
Rasterization Frame Flow
The complete rasterization pipeline follows a multi-phase architecture where each phase corresponds to a logical rendering stage. The Renderer::render_rasterization() method constructs this pipeline by importing managed resources into the graph, populating a FrameContext with per-frame data, and invoking pass record methods in dependency order.
Sources: renderer_rasterization.cpp
Phase Structure
| Phase | Pass | Input Resources | Output Resources | Purpose |
|---|---|---|---|---|
| 1 | Shadow Pass | Scene meshes, light parameters | Shadow Map Array | CSM shadow map generation |
| 2 | Depth PrePass | Visible opaque meshes | Depth (MSAA), Normal, Roughness | Early depth + G-buffer |
| 3 | GTAO Pass | Depth, Normal | AO Noisy | Screen-space ambient occlusion |
| 4 | AO Spatial | AO Noisy | AO Blurred | 5×5 bilateral blur |
| 5 | AO Temporal | AO Blurred, AO History, Depth | AO Filtered | Temporal reprojection |
| 6 | Contact Shadows | Depth, light direction | Contact Shadow Mask | Screen-space shadows |
| 7 | Forward Pass | All G-buffers, lighting data | HDR Color (MSAA) | Main lighting |
| 8 | Skybox Pass | Depth | HDR Color | Background fill |
| 9 | Tonemapping | HDR Color | Swapchain | Tone mapping + UI |
Sources: m1-frame-flow.md
MSAA Integration Strategy
When MSAA is enabled, the Depth PrePass renders to multi-sampled depth, normal, and roughness buffers, then resolves them to single-sampled targets using VK_RESOLVE_MODE_MAX_BIT for depth (preserving nearest sample) and VK_RESOLVE_MODE_AVERAGE_BIT for color data. Screen-space effects (GTAO, Contact Shadows) operate on resolved single-sampled buffers, while the Forward Pass renders to MSAA HDR color with final resolve occurring before post-processing.
Sources: depth_prepass.cpp, forward_pass.cpp
Pass Implementation Pattern
Individual render passes follow a consistent three-phase lifecycle: setup() for one-time pipeline creation, record() for per-frame graph registration, and destroy() for resource cleanup. The record() method receives the RenderGraph and FrameContext, declares resource usage via RGResourceUsage arrays, and provides an execute lambda that performs actual rendering.
Sources: forward_pass.h, gtao_pass.cpp
FrameContext as Data Carrier
The FrameContext structure aggregates all per-frame data required by passes: RG resource identifiers for the current frame's imported images, non-owning references to scene data (meshes, materials, culling results), instancing draw groups pre-sorted by the renderer, and configuration pointers for feature toggles. This single-structure approach minimizes parameter passing overhead while maintaining clear data flow boundaries.
Sources: frame_context.h
Path Tracing Alternative Flow
When path tracing mode is active, the rasterization pipeline is bypassed entirely in favor of a simplified compute-focused flow. The PT Reference View Pass uses ray tracing pipelines to accumulate samples into an RGBA32F accumulation buffer, which is then tonemapped directly to the swapchain. This separation ensures that rasterization overhead does not interfere with path tracing performance, and that PT can operate even when scene complexity exceeds rasterization memory budgets.
Sources: m1-frame-flow.md, renderer.cpp
Resource Lifecycle Management
Managed images are created during renderer initialization with explicit format, usage, and sizing specifications. The renderer maintains RGManagedHandle members for each transient resource (HDR color, depth, normals, AO buffers), importing them per-frame through the graph's managed resource API. When MSAA settings change, update_managed_desc() triggers pipeline rebuilds and potentially resource recreation if sample counts affect resolve target configurations.
Sources: renderer_init.cpp
Integration with RHI Layer
The render graph sits atop the RHI (Rendering Hardware Interface) layer, utilizing ResourceManager for image/buffer handle resolution and CommandBuffer for barrier submission. This layering ensures that graph-level decisions about pass ordering and synchronization remain independent of Vulkan-specific command recording details, while still leveraging modern Vulkan features like synchronization2 and dynamic rendering.
Sources: render_graph.h
Next Steps
Understanding the frame flow provides the foundation for extending the rendering pipeline. To implement new rendering features, explore Render Graph System for deeper resource management patterns, Depth PrePass and Forward Rendering for geometry pass implementation details, or Ambient Occlusion (GTAO) for screen-space effect integration patterns.