Shadow Mapping (CSM) and Contact Shadows

Shadow rendering in Himalaya employs a hybrid approach that combines Cascaded Shadow Maps (CSM) for distant shadows with Screen-Space Contact Shadows for fine geometric detail near the camera. This dual-layer architecture ensures both broad coverage across large scenes and precise contact hardening where objects meet the ground. The system is designed around modern Vulkan rendering patterns, leveraging reverse-Z depth buffers, bindless textures, and compute shaders for efficient parallel execution.

The CSM implementation follows the Practical Split Scheme (PSSM) for cascade distribution, with per-cascade orthographic projections that tightly fit camera sub-frustums while capturing shadow casters outside the view through scene AABB extension. Contact shadows operate as a screen-space post-process, ray-marching through the depth buffer to resolve self-shadowing details that would require impractical shadow map resolutions to capture. Together these techniques provide a complete shadow solution that scales from indoor architectural scenes to expansive outdoor environments.

Sources: shadow.h, shadow_pass.h, contact_shadows_pass.h

Cascaded Shadow Maps Architecture

The CSM system partitions the view frustum into multiple depth ranges (cascades), each rendered into a separate layer of a 2D array texture. This approach maintains consistent shadow texel density across vastly different depth ranges — near objects receive high-resolution shadows while distant elements use coarser sampling without wasting texture memory on empty space.

Cascade Split Distribution

Cascade boundaries are computed using the PSSM formula that blends logarithmic and linear distributions. The logarithmic component places more cascades near the camera where detail matters most, while the linear component ensures adequate coverage at distance. The blend factor split_lambda (typically 0.5-0.8) controls this trade-off:

C_log = near × (far/near)^(i/n)
C_lin = near + (far - near) × i/n
C_i   = lambda × C_log + (1 - lambda) × C_lin

This distribution is implemented in shadow.cpp, where splits are computed and stored for GPU-side cascade selection. The renderer uploads these splits to GlobalUniformData.cascade_splits, enabling the fragment shader to select the appropriate cascade based on view-space depth.

Sources: shadow.cpp, scene_data.h

Light-Space Projection and Stabilization

Each cascade requires an orthographic projection matrix that tightly bounds its camera sub-frustum in XY while extending Z to encompass the scene AABB. This tight fitting maximizes shadow map utilization — a cascade covering a 10-meter sub-frustum receives the full resolution rather than wasting texels on empty space beyond the visible geometry.

The projection construction involves several stabilization techniques:

Technique	Purpose	Implementation
Sub-frustum centering	Numerical precision	Light-view matrix centered on cascade sub-frustum center
Scene AABB Z-extension	Capture distant casters	Extend light-space Z to scene bounds
Texel snapping	Prevent edge shimmer	Round VP translation to texel boundaries

Texel snapping is particularly critical for camera translation stability. Without it, sub-texel camera movements cause shadow edges to shimmer as texels snap between different world positions. The implementation projects world origin through the view-projection matrix, rounds to the nearest texel center, and applies the resulting correction to the VP translation matrix.

Sources: shadow.cpp, shadow.cpp

Shadow Map Resources

The shadow pass manages a 2D array image with kMaxShadowCascades layers (4), using D32Sfloat format for depth precision. Per-layer views are created for rendering into individual cascade layers during the shadow pass execution. The resource layout is:

Shadow Map (D32Sfloat, resolution² × 4 layers)
├── Layer 0: Cascade 0 (nearest, highest detail)
├── Layer 1: Cascade 1
├── Layer 2: Cascade 2
└── Layer 3: Cascade 3 (farthest, lowest detail)

The shadow pass creates two graphics pipelines: an opaque pipeline with no fragment shader (depth written by rasterizer) for maximum Early-Z efficiency, and a mask pipeline with alpha test discard for materials with transparency masks. This separation ensures that the majority of geometry (opaque) renders with full hardware depth rejection, while only masked materials pay the cost of fragment shader execution.

Sources: shadow_pass.cpp, shadow_pass.cpp

Shadow Sampling and Filtering

The forward pass samples shadows through functions defined in shadow.glsl, which provides two filtering modes:

PCF (Percentage-Closer Filtering) — A fixed-radius kernel that performs multiple depth comparisons and averages the results. The kernel radius is configurable (0 = single sample hard shadow, 1 = 3×3, up to 5 = 11×11). Each texture fetch leverages hardware 2×2 bilinear comparison, so the effective filter width exceeds the grid dimensions.

PCSS (Percentage-Closer Soft Shadows) — Contact-hardening shadows that vary filter width based on blocker distance. The algorithm performs three steps: (1) blocker search in an elliptical region to find average blocker depth, (2) penumbra width estimation from receiver-blocker depth difference, and (3) variable-width PCF using the estimated kernel size. This produces physically plausible soft shadows that harden at contact points and soften with distance.

Sources: shadow.glsl, shadow.glsl

Cascade Blending and Distance Fade

To prevent hard boundaries between cascades, the system implements blend regions at cascade far boundaries. When a fragment falls within the blend region (controlled by shadow_blend_width), the shader samples both the current and next cascade, linearly interpolating based on position within the blend zone. This creates smooth transitions even when cascade resolutions differ significantly.

Distance fade provides a gradual transition to unshadowed rendering beyond shadow_max_distance. Rather than an abrupt cutoff, shadows fade to fully lit over a configurable fraction of the maximum distance (shadow_distance_fade_width), preventing jarring pop-in at the shadow boundary.

Sources: shadow.glsl, shadow.glsl

Screen-Space Contact Shadows

While CSM provides excellent coverage for large-scale shadows, it cannot resolve fine contact details at practical resolutions. A 2048×2048 shadow map covering 100 meters provides only 2cm texels — insufficient for capturing the precise shadow where a chair leg meets the floor. Contact shadows address this limitation through screen-space ray marching.

Ray Marching Algorithm

The contact shadows compute shader (contact_shadows.comp) traces rays from each pixel toward the primary directional light, sampling the depth buffer to detect occlusion. The algorithm operates in clip space with several key optimizations:

Non-linear step distribution — Steps are concentrated near the ray origin using a power curve (i/N)^exponent where exponent = 2.0. This places more samples where contact detail matters most while allowing coarser sampling further along the ray.

Dual depth sampling — At each step, both bilinear-filtered and nearest-neighbor depths are sampled. Bilinear creates false shadows at silhouettes (interpolates foreground/background), while nearest produces staircase artifacts. Requiring the ray to pass below both surfaces eliminates both artifact classes.

Adaptive step count — The effective step count is clamped to the screen-space ray length, ensuring no more steps than pixels to traverse. This prevents wasted work on short rays while maintaining quality on long grazing-angle shadows.

Sources: contact_shadows.comp, contact_shadows.comp

Precision and Bias Handling

Contact shadows require careful handling of numerical precision and self-intersection:

Challenge	Solution
Far pixel precision loss	Build `start_clip` directly from known NDC coordinates rather than round-trip through matrices
Self-intersection	Slope-scaled depth bias: `bias = tolerance / max(NdotL, 0.25)`
Grazing angle artifacts	Clamp bias scale to prevent extreme values at near-tangent angles
Camera plane crossing	Clamp ray end to just before camera plane when light is behind camera

The slope-scaled bias adapts to surface orientation relative to the light — surfaces facing the light (NdotL ≈ 1) receive minimal bias, while grazing angles (NdotL → 0) receive up to 4× more bias to prevent self-shadowing artifacts.

Sources: contact_shadows.comp, contact_shadows.comp

Temporal Stability

Interleaved Gradient Noise (IGN) with per-frame temporal offset converts step banding into high-frequency noise that TAA can resolve. The noise is folded into the step formula as (i + noise) / N, producing sub-step variation per pixel. The golden ratio fractional offset (0.618...) decorrelates successive frames, ensuring effective sample count multiplication through temporal accumulation.

Sources: contact_shadows.comp, noise.glsl

Integration with Forward Rendering

The forward pass combines CSM and contact shadows to produce the final shadow attenuation factor. The integration follows a specific order to maintain physical correctness:

CSM shadow evaluation — Sample cascaded shadow maps with PCF or PCSS
Cascade blending — Blend between adjacent cascades in overlap regions
Distance fade — Fade shadows to fully lit beyond max distance
Contact shadow multiplication — Apply contact shadow mask (primary light only)

Contact shadows are applied only to the primary directional light (directional_lights[0]) since the compute shader traces rays for a single light direction. The contact shadow mask is sampled from rt_contact_shadow_mask and multiplied into the radiance calculation before BRDF evaluation.

The forward shader also provides debug visualization modes for shadow debugging: DEBUG_MODE_SHADOW_CASCADES colors pixels by cascade index, and DEBUG_MODE_CONTACT_SHADOWS shows the raw contact shadow mask.

Sources: forward.frag, forward.frag

Configuration Parameters

The shadow system exposes runtime-tunable parameters through ShadowConfig and ContactShadowConfig:

ShadowConfig (CSM)

Parameter	Type	Default	Description
`cascade_count`	uint32	4	Active cascades (1-4)
`split_lambda`	float	0.75	PSSM log/linear blend
`max_distance`	float	100.0	Shadow coverage in meters
`slope_bias`	float	2.0	Hardware depth bias slope
`normal_offset`	float	2.0	Shader normal offset in texels
`pcf_radius`	uint32	2	PCF kernel radius (0=off)
`blend_width`	float	0.1	Cascade blend region fraction
`shadow_mode`	uint32	0	0=PCF, 1=PCSS
`light_angular_diameter`	float	0.00925	Light size in radians (sun ≈ 0.53°)
`pcss_quality`	uint32	1	0=Low, 1=Medium, 2=High

ContactShadowConfig

Parameter	Type	Default	Description
`step_count`	uint32	16	Ray march steps (8/16/24/32)
`max_distance`	float	0.5	Max search distance in meters
`base_thickness`	float	0.02	Depth comparison tolerance

Sources: scene_data.h, application.h

System Architecture

The shadow system spans multiple architectural layers:

┌─────────────────────────────────────────────────────────────────┐
│  Application Layer                                               │
│  ├── ShadowConfig / ContactShadowConfig (runtime parameters)    │
│  └── DebugUI (parameter tuning, visualization modes)            │
├─────────────────────────────────────────────────────────────────┤
│  Render Pass Layer                                               │
│  ├── ShadowPass (CSM depth rendering)                           │
│  └── ContactShadowsPass (compute ray marching)                  │
├─────────────────────────────────────────────────────────────────┤
│  Framework Layer                                                 │
│  ├── compute_shadow_cascades() (PSSM split + projection math)   │
│  └── ShadowCascadeResult (per-cascade VP matrices)              │
├─────────────────────────────────────────────────────────────────┤
│  Shader Layer                                                    │
│  ├── shadow.vert / shadow_masked.frag (depth pass)              │
│  ├── contact_shadows.comp (screen-space ray marching)           │
│  └── common/shadow.glsl (sampling, PCF, PCSS, cascade select)   │
└─────────────────────────────────────────────────────────────────┘

The ShadowPass manages shadow map resources and rendering, while ContactShadowsPass handles the compute dispatch for screen-space shadows. Both integrate with the Render Graph System for automatic synchronization and resource management.

For related lighting systems, see Ambient Occlusion (GTAO) which provides complementary screen-space occlusion, and Camera, Lighting, and Shadows for the broader lighting architecture.