Technical Decisions and Technology Stack
Himalaya is built on a foundation of deliberate architectural choices that prioritize long-term maintainability, cross-platform compatibility, and a pragmatic balance between visual quality and implementation complexity. This page documents the core technologies, third-party dependencies, and the reasoning behind key technical decisions that shape the renderer's design.
Vulkan API Foundation
The renderer targets Vulkan 1.4 as its graphics API baseline, leveraging modern core features that simplify rendering architecture while maintaining broad hardware compatibility. The selection prioritizes features that align with the Render Graph system's needs and reduce pipeline complexity.
| Core Feature | Purpose |
|---|---|
| Dynamic Rendering | Replaces VkRenderPass/VkFramebuffer with dynamic begin/end rendering commands, naturally fitting the Render Graph's barrier insertion model |
| Synchronization2 | Provides clearer barrier semantics through VkPipelineStageFlagBits2 and VkAccessFlagBits2, enabling more precise dependency tracking |
| Extended Dynamic State | Reduces pipeline permutation count by allowing dynamic viewport, scissor, cull mode, and depth state changes at command recording time |
| Descriptor Indexing | Enables bindless texture arrays for material systems, supporting thousands of textures with a single descriptor set |
The API choice reflects the principle of "not closing the door on mobile"—while the primary target is desktop gaming hardware, architectural decisions avoid patterns that fundamentally conflict with mobile TBDR (Tile-Based Deferred Rendering) architectures.
Sources: CLAUDE.md, technical-decisions.md
Layered Architecture
The codebase is organized into four strictly separated layers, each compiled as an independent static library with unidirectional dependencies:
This layering enforces architectural boundaries: the RHI knows nothing about render passes, passes do not reference each other, and the application orchestrates without implementing rendering logic. The separation enables independent testing and reduces the scope of changes when modifying individual systems.
Sources: CLAUDE.md
Third-Party Dependencies
Dependency management uses vcpkg in manifest mode, with all libraries declared in vcpkg.json and automatically fetched during CMake configuration. This eliminates manual library installation and ensures reproducible builds across development environments.
| Library | Version | Purpose |
|---|---|---|
| GLFW | 3.4#1 | Window creation and input handling |
| GLM | 1.0.3 | Vector/matrix math (directly used in public headers) |
| spdlog | 1.17.0 | Structured logging with compile-time level filtering |
| VMA | 3.3.0 | Vulkan memory allocation with defragmentation |
| shaderc | 2025.2 | Runtime GLSL to SPIR-V compilation |
| Dear ImGui | 1.91.9 (docking) | Debug UI with Vulkan and GLFW bindings |
| fastgltf | 0.9.0 | glTF 2.0 scene loading |
| mikktspace | 2020-10-06 | Tangent space generation for normal mapping |
| stb | 2024-07-29 | Image decoding (JPEG/PNG) |
| nlohmann-json | 3.12.0 | Configuration persistence |
| xxHash | 0.8.3 | Content-addressable caching (XXH3_128) |
Additional libraries integrated manually include bc7enc (ISPC-based BC7 texture compression) and OIDN (Intel Open Image Denoise for viewport and baking denoising).
Sources: vcpkg.json, CLAUDE.md
Rendering Framework: Forward+
The shading architecture follows Forward+ (Clustered Forward) rather than Deferred Shading, a decision driven by three primary constraints:
- Mobile TBDR Compatibility: Deferred's multi-render-target GBuffer write-read pattern fundamentally breaks mobile tile memory architectures
- MSAA Support: VR applications benefit from MSAA's temporal stability; Forward+ supports it natively while Deferred requires per-sample shading or complex workarounds
- Material Flexibility: Different shading models (PBR, toon) bind different shaders directly rather than encoding model IDs into GBuffer channels
The "cost" of Forward+—needing a depth-prepass for screen-space effects—is mitigated by the fact that Z-prepass is essentially mandatory for Early-Z optimization regardless. The evolution follows a natural progression: brute-force Forward (sufficient for single-digit light counts) → Tiled Forward (2D screen-space culling) → Clustered Forward (3D depth-extended culling).
Sources: decision-process.md
PBR and BRDF Model
The physically-based shading follows the Metallic-Roughness workflow (glTF 2.0 standard) with industry-standard component choices:
| Component | Selection | Evolution Path |
|---|---|---|
| Diffuse | Lambert → Burley | Burley adds retroreflection at grazing angles for rough surfaces |
| Specular D (NDF) | GGX | Multiscatter GGX energy compensation via precomputed LUT |
| Specular G | Smith Height-Correlated GGX | Mathematically paired with GGX NDF |
| Specular F | Schlick Approximation | Standard across all major engines |
The parameterization choice aligns with the glTF ecosystem, ensuring asset compatibility and consistent material interpretation across tools.
Sources: technical-decisions.md, brdf.glsl
Resource Management Architecture
The RHI layer implements generation-based resource handles for memory safety without smart pointer overhead. Each handle combines a slot index with a generation counter; when resources are destroyed, the generation increments, invalidating all existing handles to that slot.
struct ImageHandle {
uint32_t index = UINT32_MAX; // Slot index in resource pool
uint32_t generation = 0; // Incremented on destroy
[[nodiscard]] bool valid() const { return index != UINT32_MAX; }
};Resource destruction uses a deferred deletion queue tied to frame fences—resources are enqueued for destruction when no longer needed, but only actually destroyed after the GPU has finished processing the frame that last referenced them. This pattern eliminates use-after-free hazards in multi-buffered rendering.
Descriptor Strategy
The shader binding model uses a three-set layout that balances flexibility with implementation complexity:
| Set | Purpose | Update Frequency |
|---|---|---|
| Set 0 | Per-frame global data (matrices, lights, materials) | Once per frame |
| Set 1 | Bindless texture arrays (2D textures and cubemaps) | Static after initialization |
| Set 2 | Render target intermediate products (AO, depth, normals) | Varies by pass |
Set 0 uses traditional uniform buffers and storage buffers for scene data. Set 1 leverages descriptor indexing for bindless texture access, enabling thousands of unique textures without descriptor set fragmentation. Set 2 provides partially-bound access to intermediate render targets, allowing passes to consume outputs from previous passes through a consistent interface.
Sources: bindings.glsl
Path Tracing Infrastructure
The renderer is architected for gradual RT introduction rather than a full replacement of rasterization. Two Vulkan ray tracing APIs are employed:
| API | Use Case |
|---|---|
VK_KHR_ray_tracing_pipeline |
Baking, reference views, real-time PT (massive ray dispatch) |
VK_KHR_ray_query |
Hybrid RT effects (RT reflections, RT shadows in fragment shaders) |
Both APIs share the same VK_KHR_acceleration_structure BLAS/TLAS infrastructure. The M1 milestone implements a GPU path tracing baker and reference view using NEE (Next Event Estimation), MIS (Multiple Importance Sampling), and OIDN denoising. M2 introduces real-time PT with ReSTIR DI for direct lighting and SHaRC for indirect lighting, targeting DOOM: The Dark Ages quality levels.
Sources: technical-decisions.md
Development Toolchain
| Component | Selection | Rationale |
|---|---|---|
| Build System | CMake 4.1 | Modern transitive dependency handling, preset support |
| Compiler | MSVC + ISPC 1.30 | ISPC for SIMD texture compression kernels |
| C++ Standard | C++20 | Concepts, designated initializers, ranges |
| Package Manager | vcpkg (manifest) | Reproducible dependency resolution |
| IDE | CLion | Native CMake integration, remote development |
The build produces four artifacts: himalaya_rhi.lib, himalaya_framework.lib, himalaya_passes.lib, and himalaya_app.exe. A post-build step synchronizes shaders to the output directory and copies OIDN runtime DLLs.
Sources: CMakeLists.txt, CLAUDE.md
Design Philosophy Summary
The technology stack reflects a consistent set of principles documented in Requirements and Design Philosophy:
- Progressive Implementation: Pass 1/2/3 evolution paths allow working systems early, with complexity added incrementally
- Industry Validation: All major technologies have proven implementations and extensive documentation
- Performance-Quality Balance: Sweet-spot targeting for mid-range desktop hardware
- Hybrid Pipeline Compatibility: Decisions preserve optionality for RT integration without sacrificing current rasterization quality
- Design Constraints for Simplicity: Accepting scene limitations (e.g., single interactive door) to avoid exponential complexity
These principles manifest in concrete choices like skipping Hosek-Wilkie sky (directly implementing Bruneton), bypassing HBAO for GTAO, and using SDK-based upscaling (FSR/DLSS) rather than custom TAA implementations.
Sources: requirements-and-philosophy.md
Next Steps
With the foundational technology stack established, the logical progression leads to understanding how these components interact during frame execution. The Frame Flow and Render Graph Design page details the per-frame resource management and synchronization patterns that coordinate these systems.