Ray Tracing Shaders

The ray tracing shader system in Himalaya implements a complete path tracing reference view using Vulkan's ray tracing pipeline. This documentation covers the shader architecture, the five shader stages (raygen, closest-hit, any-hit, miss, and shadow miss), and the shared utilities that enable physically-based rendering with multiple importance sampling.

Shader Architecture Overview

The ray tracing pipeline follows the Mode A architecture where all surface shading computation resides in the closest-hit shader, while the raygen shader focuses on path accumulation and bounce management. This design separates concerns clearly: raygen handles the Monte Carlo path loop, while closest-hit handles material evaluation, next event estimation (NEE), and BRDF sampling.

The pipeline consists of five shader stages organized into four shader binding table (SBT) groups:

SBT Group Type Shader(s) Purpose
Group 0 General reference_view.rgen Primary ray generation, path loop, accumulation
Group 1 General miss.rmiss Environment sampling on geometry miss
Group 2 General shadow_miss.rmiss Shadow ray visibility confirmation
Group 3 Triangles Hit Group closesthit.rchit + anyhit.rahit Surface shading, alpha testing

Sources: rt_pipeline.h, rt_pipeline.cpp

Ray Generation Shader (`reference_view.rgen`)

The raygen shader is the entry point for path tracing. It executes once per pixel and implements the primary ray generation, path tracing loop with Russian Roulette termination, and running-average accumulation.

Primary Ray Generation

Primary rays are generated by unprojecting pixel coordinates through the inverse camera matrices. Subpixel jittering via Sobol quasi-random sequence provides anti-aliasing:

// Subpixel jitter via Sobol dims 0-1
float jitter_x = rand_pt(0, pc.sample_count, pixel, pc.frame_seed, pc.blue_noise_index);
float jitter_y = rand_pt(1, pc.sample_count, pixel, pc.frame_seed, pc.blue_noise_index);

// Pixel center + jitter → NDC → clip space
vec2 uv = (vec2(pixel) + vec2(jitter_x, jitter_y)) / vec2(size);
vec2 ndc = vec2(uv.x * 2.0 - 1.0, -(uv.y * 2.0 - 1.0));

// Unproject to world-space ray
vec4 clip_target = vec4(ndc, 1.0, 1.0);
vec4 view_target = global.inv_projection * clip_target;
view_target /= view_target.w;

vec3 ray_origin    = (global.inv_view * vec4(0.0, 0.0, 0.0, 1.0)).xyz;
vec3 ray_direction = normalize((global.inv_view * vec4(view_target.xyz, 0.0)).xyz);

Sources: reference_view.rgen

Path Tracing Loop

The path loop implements Russian Roulette for unbiased path termination starting from bounce 2. The survival probability is based on the maximum component of current throughput, clamped to [0.05, 0.95] to prevent extreme variance:

for (uint bounce = 0; bounce < pc.max_bounces; ++bounce) {
    // Russian Roulette (bounce >= 2)
    if (bounce >= 2u) {
        float rr_prob = russian_roulette(throughput, bounce, rr_rand, survive);
        if (!survive) break;
        throughput /= rr_prob;
    }
    
    // Trace ray and accumulate contribution
    traceRayEXT(tlas, ..., payload);
    total_radiance += throughput * payload.color;
    
    // Update throughput and advance ray
    throughput *= payload.throughput_update;
    origin = payload.next_origin;
    direction = payload.next_direction;
}

Sources: reference_view.rgen, pt_common.glsl

Running Average Accumulation

The accumulation buffer stores a running average of all path samples. On the first frame (sample_count == 0), it overwrites; subsequent frames blend using incremental averaging:

if (pc.sample_count == 0u) {
    imageStore(accumulation_image, pixel, vec4(total_radiance, 1.0));
} else {
    vec4 old_value = imageLoad(accumulation_image, pixel);
    float weight = 1.0 / float(pc.sample_count + 1u);
    vec3 result = mix(old_value.rgb, total_radiance, weight);
    imageStore(accumulation_image, pixel, vec4(result, 1.0));
}

Sources: reference_view.rgen

Closest-Hit Shader (`closesthit.rchit`)

The closest-hit shader performs all surface shading operations: vertex interpolation, normal mapping, material sampling, next event estimation for direct lighting, and multi-lobe BRDF sampling for indirect bounces.

Vertex Interpolation via Buffer References

Vertex data is accessed through buffer references (device addresses) stored in the GeometryInfo buffer. This avoids indirection through vertex buffers and enables direct fetch from the hit point:

// Fetch triangle indices
IndexBuffer ib = IndexBuffer(geo.index_buffer_address);
uint i0 = ib.indices[3 * gl_PrimitiveID + 0];
uint i1 = ib.indices[3 * gl_PrimitiveID + 1];
uint i2 = ib.indices[3 * gl_PrimitiveID + 2];

// Fetch vertices with byte offset
VertexBuffer v0 = VertexBuffer(geo.vertex_buffer_address + uint64_t(i0) * VERTEX_STRIDE);
// ... interpolate using barycentric coordinates

Sources: pt_common.glsl, bindings.glsl

Normal Mapping with Consistency Correction

The shader applies normal mapping using a TBN basis constructed from the interpolated tangent and normal. A consistency correction ensures the shading normal never points below the geometric surface, preventing light leaks:

vec3 N_shading = get_shading_normal(N_interp, vec4(T_world, hit.tangent.w),
                                    normal_rg, mat.normal_scale);
N_shading = ensure_normal_consistency(N_shading, N_face);

The ensure_normal_consistency function reflects the shading normal if it points to the wrong side of the geometric normal.

Sources: closesthit.rchit, pt_common.glsl

Next Event Estimation (NEE)

The shader implements NEE for both directional lights and environment lighting, using multiple importance sampling (MIS) to combine light sampling with BRDF sampling strategies.

Directional Lights (delta distribution, no MIS needed):

for (uint i = 0; i < global.directional_light_count; ++i) {
    // Shadow ray with terminate-on-first-hit optimization
    traceRayEXT(tlas, 
        gl_RayFlagsTerminateOnFirstHitEXT | gl_RayFlagsSkipClosestHitShaderEXT,
        ..., shadow_payload);
    
    if (shadow_payload.visible == 1u) {
        nee_radiance += evaluate_brdf(...) * light_color * intensity * NdotL;
    }
}

Environment Lighting (alias table importance sampling + MIS):

vec3 L = sample_env_alias_table(env_r1, env_r2, env_r3, env_r4);
// Shadow ray to check visibility
if (shadow_payload.visible == 1u) {
    float mis_w = mis_power_heuristic(pdf_light, brdf_pdf);
    nee_radiance += env_color * brdf_val * NdotL * mis_w / pdf_light;
}

Sources: closesthit.rchit

Multi-Lobe BRDF Sampling

The BRDF is split into diffuse (Lambertian) and specular (GGX) lobes. Lobe selection uses Fresnel-weighted probability based on the luminance of F_Schlick at the current view angle:

float p_spec = specular_probability(NdotV, F0);  // clamped to [0.01, 0.99]

if (rand_lobe < p_spec) {
    // Specular: GGX VNDF importance sampling (Heitz 2018)
    vec3 H_ts = sample_ggx_vndf(Ve, roughness, vec2(rand_xi0, rand_xi1));
    vec3 L_ts = reflect(-Ve, H_ts);
    throughput_update = (D * Vis * F * NdotL) / (pdf * p_spec);
} else {
    // Diffuse: cosine-weighted hemisphere sampling
    vec3 L_ts = sample_cosine_hemisphere(vec2(rand_xi0, rand_xi1));
    throughput_update = diffuse_color / (1.0 - p_spec);
}

The combined multi-lobe PDF is computed for MIS weighting when the BRDF-sampled ray eventually misses geometry and hits the environment.

Sources: closesthit.rchit, pt_common.glsl

OIDN Auxiliary Output

On bounce 0, the shader writes albedo and normal data to auxiliary images for Intel Open Image Denoise (OIDN):

if (payload.bounce == 0u) {
    ivec2 pixel = ivec2(gl_LaunchIDEXT.xy);
    imageStore(aux_albedo_image, pixel, vec4(diffuse_color, 1.0));
    imageStore(aux_normal_image, pixel, vec4(N_shading, 1.0));
}

Sources: closesthit.rchit

Any-Hit Shader (`anyhit.rahit`)

The any-hit shader handles alpha testing for non-opaque geometry. It supports two alpha modes:

Mode Value Behavior
Opaque 0 Never reaches any-hit (hardware skip via VK_GEOMETRY_OPAQUE_BIT_KHR)
Mask 1 Hard cutoff: discard if alpha < alpha_cutoff
Blend 2 Stochastic alpha using PCG hash random

For blended materials, stochastic transparency provides unbiased transparency without sorting:

// Blend: stochastic alpha (PCG hash)
uint seed = gl_LaunchIDEXT.x
          ^ (gl_LaunchIDEXT.y * 1103515245u)
          ^ (pc.frame_seed * 747796405u)
          ^ gl_PrimitiveID
          ^ (gl_GeometryIndexEXT * 2654435761u);
float rand_val = float(pcg_hash(seed)) / 4294967296.0;

if (rand_val >= texel_alpha) {
    ignoreIntersectionEXT;
}

Sources: anyhit.rahit

Miss Shaders

Environment Miss (`miss.rmiss`)

When a ray misses all geometry, the environment miss shader samples the IBL cubemap with Y-axis rotation applied:

vec3 dir = rotate_y(gl_WorldRayDirectionEXT,
                    global.ibl_rotation_sin,
                    global.ibl_rotation_cos);

vec3 env_color = texture(cubemaps[nonuniformEXT(global.skybox_cubemap_index)], dir).rgb
                 * global.ibl_intensity;

payload.color = env_color;
payload.hit_distance = -1.0;  // Signal path termination

Sources: miss.rmiss

Shadow Miss (`shadow_miss.rmiss`)

The shadow miss shader marks light visibility when a shadow ray reaches tMax without hitting geometry:

layout(location = 1) rayPayloadInEXT ShadowPayload shadow_payload;

void main() {
    shadow_payload.visible = 1;
}

Sources: shadow_miss.rmiss

Shared Utilities (`pt_common.glsl`)

Ray Payloads

Two payload structures are defined:

struct PrimaryPayload {
    vec3  color;              // Radiance contribution from this bounce
    vec3  next_origin;        // Next ray origin (offset from surface)
    vec3  next_direction;     // Next ray direction (BRDF sampled)
    vec3  throughput_update;  // Path throughput multiplier
    float hit_distance;       // Hit distance (-1 = miss)
    uint  bounce;             // Current bounce index
    float env_mis_weight;     // MIS weight for env map on miss
};

struct ShadowPayload {
    uint visible;             // 0 = occluded, 1 = visible
};

Sources: pt_common.glsl

Random Number Generation

The path tracer uses Sobol quasi-random sequences with Cranley-Patterson rotation for low-discrepancy sampling. Blue noise provides per-pixel offsets, and golden-ratio scrambling provides temporal decorrelation:

float rand_pt(uint dim, uint sample_index, ivec2 pixel,
              uint frame_seed, uint blue_noise_index) {
    float s = sobol_sample(dim, sample_index);
    
    // Per-pixel blue noise offset
    ivec2 noise_coord = (pixel + ivec2(dim * 73u, dim * 127u)) & 127;
    float offset = texelFetch(textures[blue_noise_index], noise_coord, 0).r;
    
    // Golden-ratio temporal scramble
    offset = fract(offset + float(frame_seed) * 0.6180339887);
    
    return fract(s + offset);
}

Sources: pt_common.glsl

Environment Map Importance Sampling

The alias table provides O(1) sampling proportional to luminance × sin(theta) weights. The PDF computation uses stored luminance values to ensure exact consistency with the sampling distribution:

vec3 sample_env_alias_table(float rand1, float rand2, float rand3, float rand4) {
    uint N = entry_count;
    uint idx = min(uint(rand1 * float(N)), N - 1u);
    EnvAliasEntry e = env_alias_entries[idx];
    uint pixel = (rand2 < e.prob) ? idx : e.alias_index;
    // ... convert to direction
}

float env_pdf(vec3 world_dir) {
    // Look up stored luminance from alias table
    float lum = env_alias_entries[pixel].luminance;
    return lum * float(w) * float(h) / (total_luminance * TWO_PI * PI);
}

Sources: pt_common.glsl

Ray Origin Offset

The Wächter & Binder method from Ray Tracing Gems Chapter 6 provides robust self-intersection avoidance without scene-dependent epsilon:

vec3 offset_ray_origin(vec3 p, vec3 n_geo) {
    ivec3 of_i = ivec3(RT_ORIGIN_INT_SCALE * n_geo);
    vec3 p_i = vec3(
        intBitsToFloat(floatBitsToInt(p.x) + ((p.x < 0.0) ? -of_i.x : of_i.x)),
        // ... y, z
    );
    return p_i;
}

Sources: pt_common.glsl

C++ Integration

The ReferenceViewPass class manages the RT pipeline creation and per-frame dispatch. It compiles all five shader stages and builds the SBT with proper alignment:

const rhi::RTPipelineDesc desc{
    .raygen = rgen_module,
    .miss = miss_module,
    .shadow_miss = shadow_miss_module,
    .closesthit = chit_module,
    .anyhit = ahit_module,
    .max_recursion_depth = 1,
    .descriptor_set_layouts = set_layouts,
    .push_constant_ranges = {&push_range, 1},
};

rt_pipeline_ = rhi::create_rt_pipeline(*ctx_, desc);

Per-frame recording uses push descriptors for the accumulation images and Sobol buffer, then dispatches trace_rays with the image dimensions.

Sources: reference_view_pass.cpp, reference_view_pass.cpp