Skip to content

BharathSShankar/RayTracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CPU Ray Tracer

A from-scratch CPU-based ray tracer implementation in C++ with clean architecture and extensible design, optimized for Apple Silicon (M1/M2/M3) Macs.


πŸ“Š Project Metrics

Codebase

Metric Value
Total Source Lines 5,002 lines
Source Files 17 headers + 1 main.cpp
Test Lines 481 lines
Unit Tests 39 tests (Google Test)
Total Source Size ~149 KB
C++ Standard C++20

Renderer Configuration

Metric Value
Render Resolution 960 Γ— 800 px
Tile Size 32 Γ— 32 px β†’ 750 tiles / frame
Default Samples/Pixel 100 (configurable 1–1000)
Max Ray Depth 50 (configurable 1–200)
Ray Packet Width 4 rays (128-bit NEON register)
BVH Complexity O(log n) traversal
Demo Scenes 12 (4 β†’ 153 objects)
Max Thread Count 16 P-cores (Apple Silicon)
Rays Per Frame (default) 76.8 M (960 Γ— 800 Γ— 100 spp)

⚑ Measured Performance β€” Apple M-series (ARM64, NEON SIMD)

Benchmarked on Apple Silicon (4 P-cores, 10 total threads) at 960Γ—800 with BVH enabled.

Thread Scaling β€” Scene 9 (79 random spheres), 50 spp:

Threads Render Time Throughput Speedup Efficiency
1 thread 7.59 s 5.06 M rays/s 1.0Γ— 100%
2 threads 4.29 s 8.95 M rays/s 1.8Γ— 89%
4 P-cores 3.24 s 11.84 M rays/s 2.3Γ— 58%

Scene Complexity β€” 4 P-cores, 50 spp:

Scene Objects BVH Nodes Time Throughput vs Simple
Basic Spheres 3 + 1 plane 3 nodes 2.90 s 13.22 M/s baseline
Classic 3-Sphere 3 + 1 plane 3 nodes 3.07 s 12.50 M/s 1.06Γ—
Mixed Primitives 3 + 2 planes 3 nodes 6.57 s 5.84 M/s 2.26Γ— ⚠️
Random Spheres 79 + 1 plane 93 nodes 3.18 s 12.09 M/s 1.09Γ—
Sphere Grid 121 + 1 plane 127 nodes 2.99 s 12.84 M/s 1.03Γ—
Mixed Large 153 + 1 plane 177 nodes 3.29 s 11.69 M/s 1.13Γ—

⚠️ Mixed Primitives is slower because it has 2 unbounded planes β€” infinite planes bypass the BVH and are tested against every ray individually. The BVH itself scales O(log n) β€” 3 β†’ 153 objects is only 13% overhead.

Sample Scaling β€” Classic 3-Sphere, 4 P-cores:

Samples/px Render Time Throughput
10 spp 0.61 s 12.53 M rays/s
50 spp 3.11 s 12.35 M rays/s
100 spp 6.18 s 12.43 M rays/s
250 spp 15.45 s 12.43 M rays/s

Throughput is constant at ~12.4 M rays/s regardless of sample count β€” confirms linear O(spp) scaling with zero overhead per additional sample.


✨ Features

All Phases Complete:

  • βœ… Multiple geometric primitives (Sphere, Plane, Triangle, Box)
  • βœ… Configurable look-at camera with FOV control and depth of field
  • βœ… Polymorphic architecture for easy extensibility
  • βœ… Material system (Lambertian, Metal, Dielectric)
  • βœ… Recursive ray tracing with realistic lighting
  • βœ… Anti-aliasing via multi-sampling with gamma correction
  • βœ… Multi-threaded tile-based rendering with work-stealing thread pool
  • βœ… BVH acceleration structure for O(log n) ray-scene intersection
  • βœ… SIMD-optimized ray packet tracing using ARM NEON via GLM
  • βœ… Apple Silicon optimizations (P-core detection, 128-byte cache lines, QoS)
  • βœ… Interactive UI with 12 demo scenes

πŸš€ Quick Start

Build and Run

mkdir build && cd build
cmake ..
make
./MyExecutable

Controls

  • R: Render current scene
  • 1–12: Switch between demo scenes
  • ESC: Exit application

Demo Scenes

# Scene Objects Description
1 Basic Spheres 5 Simple diffuse materials
2 Metal Spheres 4 Reflective surfaces
3 Classic 3-Sphere 4 Metal + diffuse mix
4 Pyramid 5 Triangle primitives
5 Boxes 4 AABB primitives
6 Mixed Primitives 5 All geometry types
7 Glass Spheres 5 Dielectric materials
8 All Materials 4 Diffuse, metal, glass
9 Random Spheres 100+ BVH stress test
10 Sphere Grid 121 Uniform distribution
11 Cornell Box 8 Classic lighting test
12 Mixed Large 150+ All primitives at scale

πŸ“ Project Structure

RayTracer/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.cpp           # Entry point, render loop, UI          (534 lines)
β”‚   β”œβ”€β”€ SIMD.h             # SIMD config + ray packets              (871 lines)
β”‚   β”œβ”€β”€ SceneFactory.h     # Pre-built demo scenes                  (597 lines)
β”‚   β”œβ”€β”€ UI.h               # Immediate-mode UI                      (504 lines)
β”‚   β”œβ”€β”€ PacketBVH.h        # SIMD-optimized packet BVH              (444 lines)
β”‚   β”œβ”€β”€ ThreadPool.h       # Thread pool + tile renderer            (343 lines)
β”‚   β”œβ”€β”€ BVH.h              # Standard BVH acceleration              (263 lines)
β”‚   β”œβ”€β”€ Material.h         # Lambertian, Metal, Dielectric          (142 lines)
β”‚   β”œβ”€β”€ AABB.h             # Axis-Aligned Bounding Box              (145 lines)
β”‚   β”œβ”€β”€ Camera.h           # Look-at camera with DOF                 (76 lines)
β”‚   β”œβ”€β”€ Scene.h            # Scene container                         (95 lines)
β”‚   β”œβ”€β”€ Hittable.h         # Abstract base + HitRecord              (66 lines)
β”‚   └── Primitives:
β”‚       β”œβ”€β”€ Sphere.h       # Quadratic intersection                  (91 lines)
β”‚       β”œβ”€β”€ Plane.h        # Dot product intersection               (133 lines)
β”‚       β”œβ”€β”€ Triangle.h     # MΓΆller–Trumbore algorithm               (87 lines)
β”‚       β”œβ”€β”€ Box.h          # Slab method intersection               (105 lines)
β”‚       └── Ray.h          # Ray struct                              (25 lines)
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ comprehensive-guide.md  # ⭐ Full technical docs + M1 optimizations
β”‚   β”œβ”€β”€ README.md               # Documentation index
β”‚   β”œβ”€β”€ next-steps.md           # Future features roadmap
β”‚   β”œβ”€β”€ primitives-reference.md # Math & algorithms
β”‚   └── camera-and-architecture.md
β”‚
└── tests/
    └── test_raytracer.cpp      # 39 Google Test cases              (481 lines)

🎯 Current Status

βœ… All Phases Complete!

Phase 1 & 2 β€” Geometry & Architecture

  • Multiple primitive types (Sphere, Plane, Triangle, Box)
  • Polymorphic architecture with smart pointers
  • Camera system with configurable FOV and look-at targeting

Phase 3 β€” Materials & Lighting

  • Lambertian (diffuse), Metal (reflective), Dielectric (glass)
  • Recursive ray tracing with Schlick's Fresnel approximation

Phase 4 β€” Image Quality

  • Anti-aliasing via multi-sampling (jittered)
  • Gamma correction (Ξ³ = 2.0, i.e. √ per channel)
  • Depth of field (aperture lens blur)

Phase 5 β€” Optimization ✨

  • Multi-threaded tile-based rendering with work-stealing
  • BVH acceleration structure β€” O(log n) intersection, longest-axis split
  • SIMD-optimized ray packet tracing (4-wide, SoA layout)
  • Apple Silicon optimizations (P-core detection, cache alignment, QoS)

πŸ”¬ What Works β€” Technical Deep Dive

Primitives

Primitive Algorithm Bounding Box
Sphere Quadratic equation (discriminant test) center Β± radius
Plane Dot product slab β€” t = (d - dot(origin, n)) / dot(dir, n) Unbounded (skips BVH)
Triangle MΓΆller–Trumbore (barycentric coords, zero alloc) min/max of 3 vertices + pad
Box Slab method β€” 3 axis-aligned pairs Min/max corner pair

Materials

Material Model Key Parameters
Lambertian Cosine-weighted diffuse scatter albedo (RGB)
Metal Perfect mirror reflection + fuzz albedo, fuzz ∈ [0, 1]
Dielectric Snell's law refraction + Schlick Fresnel ior (e.g. glass=1.5, water=1.33, diamond=2.4)

Acceleration β€” BVH

Threading

Component Detail
ThreadPool Fixed worker pool, condition-variable sleep (no spinlock)
TileRenderer 32Γ—32 px tiles, work-stealing via atomic<int> next_tile
PaddedAtomic<T> Per-cache-line padding (128 B on Apple Silicon, 64 B elsewhere)
Thread-local RNG Each tile worker has its own mt19937 β€” no mutex on random
P-core detection sysctlbyname("hw.perflevel0.physicalcpu") β€” skips E-cores
QoS QOS_CLASS_USER_INTERACTIVE β€” highest scheduling priority

SIMD β€” SIMD.h

Feature Detail
Backend ARM NEON via GLM_FORCE_INTRINSICS + GLM_FORCE_DEFAULT_ALIGNED_GENTYPES
Packet width 4 rays (PACKET_SIZE = 4) β€” matches 128-bit NEON register
Layout Structure-of-Arrays (SoA) β€” origins_x[4], origins_y[4], …
Alignment alignas(16) on all SoA arrays β€” NEON-ready
Precomputed inv_directions + signs per ray β€” branchless AABB slab test
Diagnostics print_simd_diagnostics() prints GLM arch + alignment on startup

Camera β€” Camera.h

  • Look-at model β€” {look_from, look_at, vup, vfov, aspect_ratio}
  • Depth of field β€” disk sampling (lens_radius = aperture / 2) with focus_dist plane
  • Thread-safe β€” thread_local RNG inside get_ray()

Rendering Pipeline

Per tile (32Γ—32):
  For each pixel:
    For each sample (N spp):
      1. Jitter (u,v) within pixel
      2. camera.get_ray(u, v)           ← DoF offset applied
      3. trace_ray(ray, scene, depth)   ← recursive BVH traversal
         └─ material.scatter()          ← importance sampled BRDF
      4. Accumulate colour
    5. Average + gamma correct (√)
    6. Write to pixel buffer

πŸ›  Technologies

Technology Version Role
C++ C++20 Core language
GLM Header-only Math (vec3, mat4) + NEON SIMD
SDL2 2.x Window creation, pixel buffer
SDL2_ttf 2.x UI text rendering
CMake 3.20+ Build system
Google Test Latest 39-test unit suite

🍎 Apple Silicon Optimizations

Optimization File Benefit
NEON SIMD via GLM SIMD.h 4-wide parallel ray processing
128-byte cache line padding ThreadPool.h Eliminates false sharing on atomics
P-core detection (hw.perflevel0.physicalcpu) ThreadPool.h Uses performance cores only
QOS_CLASS_USER_INTERACTIVE ThreadPool.h Highest-priority thread scheduler slot
__builtin_arm_yield() ThreadPool.h Efficient spin-hint (no wasted cycles)
-march=native -arch arm64 CMakeLists.txt Native ARM64 code generation
-funroll-loops -ftree-vectorize CMakeLists.txt Compiler auto-vectorisation

See docs/comprehensive-guide.md for the full breakdown.


πŸ“š Documentation

Start here: docs/comprehensive-guide.md β€” Complete technical documentation including:

  • Full architecture overview
  • M1/M2/M3 Mac specific optimizations
  • SIMD and NEON configuration
  • Threading and parallelism details
  • BVH acceleration explained
  • Lessons learned and best practices

Other documentation in docs/:


πŸ— Architecture Highlights

Polymorphic Design:

class Hittable {               // Base interface
    virtual bool hit(...) = 0;
    virtual bool bounding_box(...) = 0;
};

class Sphere    : public Hittable { ... };  // Quadratic eq
class Triangle  : public Hittable { ... };  // MΓΆller–Trumbore
class Box       : public Hittable { ... };  // Slab method
class BVHNode   : public Hittable { ... };  // Recursive tree node
class BVHScene  : public Hittable { ... };  // Bounded + unbounded split

Material System:

class Material {
    virtual bool scatter(ray_in, record, attenuation, scattered) = 0;
};

class Lambertian : public Material { ... };  // Diffuse (cosine scatter)
class Metal      : public Material { ... };  // Reflection + fuzz
class Dielectric : public Material { ... };  // Snell + Schlick Fresnel

Thread Pool (work-stealing):

// Each thread grabs tiles atomically until exhausted
int tile_idx = next_tile.fetch_add(1, seq_cst);

βš™οΈ Configuration

Rendering parameters are adjustable via the UI, or set defaults in src/main.cpp:

struct RenderParams {
    int num_threads       = 8;     // Thread count (auto-set to P-core count)
    int samples_per_pixel = 100;   // Anti-aliasing samples (1–1000)
    int max_depth         = 50;    // Ray bounce limit (1–200)
    float fov             = 60.0f; // Vertical field of view (degrees)
    float aperture        = 0.1f;  // Depth of field blur (0 = pinhole)
    float focus_dist      = 0.0f;  // 0 = auto (distance to look-at)
    bool use_packet_tracing = true; // SIMD packet mode on/off
};

πŸš€ Future Improvements

  • Texture mapping (image and procedural)
  • OBJ mesh loading
  • Emissive materials (area lights)
  • Progressive/incremental rendering
  • GPU acceleration (Metal API)
  • Denoising at low sample counts
  • SAH (Surface Area Heuristic) BVH split

πŸŽ‰ Acknowledgments

Based on "Ray Tracing in One Weekend" series by Peter Shirley and the ray tracing community's accumulated knowledge.

Further reading:

  • Ray Tracing in One Weekend β€” Peter Shirley (FREE online)
  • Scratchapixel.com β€” Best graphics math tutorials
  • Physically Based Rendering β€” Pharr, Jakob, Humphreys

Want to understand the code? Read docs/comprehensive-guide.md for full technical documentation! πŸ“–

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors