CPU Ray Tracer

A from-scratch CPU-based ray tracer implementation in C++ with clean architecture and extensible design, optimized for Apple Silicon (M1/M2/M3) Macs.

📊 Project Metrics

Codebase

Metric	Value
Total Source Lines	5,002 lines
Source Files	17 headers + 1 `main.cpp`
Test Lines	481 lines
Unit Tests	39 tests (Google Test)
Total Source Size	~149 KB
C++ Standard	C++20

Renderer Configuration

Metric	Value
Render Resolution	960 × 800 px
Tile Size	32 × 32 px → 750 tiles / frame
Default Samples/Pixel	100 (configurable 1–1000)
Max Ray Depth	50 (configurable 1–200)
Ray Packet Width	4 rays (128-bit NEON register)
BVH Complexity	O(log n) traversal
Demo Scenes	12 (4 → 153 objects)
Max Thread Count	16 P-cores (Apple Silicon)
Rays Per Frame (default)	76.8 M (960 × 800 × 100 spp)

⚡ Measured Performance — Apple M-series (ARM64, NEON SIMD)

Benchmarked on Apple Silicon (4 P-cores, 10 total threads) at 960×800 with BVH enabled.

Thread Scaling — Scene 9 (79 random spheres), 50 spp:

Threads	Render Time	Throughput	Speedup	Efficiency
1 thread	7.59 s	5.06 M rays/s	1.0×	100%
2 threads	4.29 s	8.95 M rays/s	1.8×	89%
4 P-cores	3.24 s	11.84 M rays/s	2.3×	58%

Scene Complexity — 4 P-cores, 50 spp:

Scene	Objects	BVH Nodes	Time	Throughput	vs Simple
Basic Spheres	3 + 1 plane	3 nodes	2.90 s	13.22 M/s	baseline
Classic 3-Sphere	3 + 1 plane	3 nodes	3.07 s	12.50 M/s	1.06×
Mixed Primitives	3 + 2 planes	3 nodes	6.57 s	5.84 M/s	2.26× ⚠️
Random Spheres	79 + 1 plane	93 nodes	3.18 s	12.09 M/s	1.09×
Sphere Grid	121 + 1 plane	127 nodes	2.99 s	12.84 M/s	1.03×
Mixed Large	153 + 1 plane	177 nodes	3.29 s	11.69 M/s	1.13×

⚠️ Mixed Primitives is slower because it has 2 unbounded planes — infinite planes bypass the BVH and are tested against every ray individually. The BVH itself scales O(log n) — 3 → 153 objects is only 13% overhead.

Sample Scaling — Classic 3-Sphere, 4 P-cores:

Samples/px	Render Time	Throughput
10 spp	0.61 s	12.53 M rays/s
50 spp	3.11 s	12.35 M rays/s
100 spp	6.18 s	12.43 M rays/s
250 spp	15.45 s	12.43 M rays/s

Throughput is constant at ~12.4 M rays/s regardless of sample count — confirms linear O(spp) scaling with zero overhead per additional sample.

✨ Features

All Phases Complete:

✅ Multiple geometric primitives (Sphere, Plane, Triangle, Box)
✅ Configurable look-at camera with FOV control and depth of field
✅ Polymorphic architecture for easy extensibility
✅ Material system (Lambertian, Metal, Dielectric)
✅ Recursive ray tracing with realistic lighting
✅ Anti-aliasing via multi-sampling with gamma correction
✅ Multi-threaded tile-based rendering with work-stealing thread pool
✅ BVH acceleration structure for O(log n) ray-scene intersection
✅ SIMD-optimized ray packet tracing using ARM NEON via GLM
✅ Apple Silicon optimizations (P-core detection, 128-byte cache lines, QoS)
✅ Interactive UI with 12 demo scenes

🚀 Quick Start

Build and Run

mkdir build && cd build
cmake ..
make
./MyExecutable

Controls

R: Render current scene
1–12: Switch between demo scenes
ESC: Exit application

Demo Scenes

#	Scene	Objects	Description
1	Basic Spheres	5	Simple diffuse materials
2	Metal Spheres	4	Reflective surfaces
3	Classic 3-Sphere	4	Metal + diffuse mix
4	Pyramid	5	Triangle primitives
5	Boxes	4	AABB primitives
6	Mixed Primitives	5	All geometry types
7	Glass Spheres	5	Dielectric materials
8	All Materials	4	Diffuse, metal, glass
9	Random Spheres	100+	BVH stress test
10	Sphere Grid	121	Uniform distribution
11	Cornell Box	8	Classic lighting test
12	Mixed Large	150+	All primitives at scale

📁 Project Structure

RayTracer/
├── src/
│   ├── main.cpp           # Entry point, render loop, UI          (534 lines)
│   ├── SIMD.h             # SIMD config + ray packets              (871 lines)
│   ├── SceneFactory.h     # Pre-built demo scenes                  (597 lines)
│   ├── UI.h               # Immediate-mode UI                      (504 lines)
│   ├── PacketBVH.h        # SIMD-optimized packet BVH              (444 lines)
│   ├── ThreadPool.h       # Thread pool + tile renderer            (343 lines)
│   ├── BVH.h              # Standard BVH acceleration              (263 lines)
│   ├── Material.h         # Lambertian, Metal, Dielectric          (142 lines)
│   ├── AABB.h             # Axis-Aligned Bounding Box              (145 lines)
│   ├── Camera.h           # Look-at camera with DOF                 (76 lines)
│   ├── Scene.h            # Scene container                         (95 lines)
│   ├── Hittable.h         # Abstract base + HitRecord              (66 lines)
│   └── Primitives:
│       ├── Sphere.h       # Quadratic intersection                  (91 lines)
│       ├── Plane.h        # Dot product intersection               (133 lines)
│       ├── Triangle.h     # Möller–Trumbore algorithm               (87 lines)
│       ├── Box.h          # Slab method intersection               (105 lines)
│       └── Ray.h          # Ray struct                              (25 lines)
│
├── docs/
│   ├── comprehensive-guide.md  # ⭐ Full technical docs + M1 optimizations
│   ├── README.md               # Documentation index
│   ├── next-steps.md           # Future features roadmap
│   ├── primitives-reference.md # Math & algorithms
│   └── camera-and-architecture.md
│
└── tests/
    └── test_raytracer.cpp      # 39 Google Test cases              (481 lines)

🎯 Current Status

✅ All Phases Complete!

Phase 1 & 2 — Geometry & Architecture

Multiple primitive types (Sphere, Plane, Triangle, Box)
Polymorphic architecture with smart pointers
Camera system with configurable FOV and look-at targeting

Phase 3 — Materials & Lighting

Lambertian (diffuse), Metal (reflective), Dielectric (glass)
Recursive ray tracing with Schlick's Fresnel approximation

Phase 4 — Image Quality

Anti-aliasing via multi-sampling (jittered)
Gamma correction (γ = 2.0, i.e. √ per channel)
Depth of field (aperture lens blur)

Phase 5 — Optimization ✨

Multi-threaded tile-based rendering with work-stealing
BVH acceleration structure — O(log n) intersection, longest-axis split
SIMD-optimized ray packet tracing (4-wide, SoA layout)
Apple Silicon optimizations (P-core detection, cache alignment, QoS)

🔬 What Works — Technical Deep Dive

Primitives

Primitive	Algorithm	Bounding Box
`Sphere`	Quadratic equation (discriminant test)	`center ± radius`
`Plane`	Dot product slab — `t = (d - dot(origin, n)) / dot(dir, n)`	Unbounded (skips BVH)
`Triangle`	Möller–Trumbore (barycentric coords, zero alloc)	min/max of 3 vertices + pad
`Box`	Slab method — 3 axis-aligned pairs	Min/max corner pair

Materials

Material	Model	Key Parameters
`Lambertian`	Cosine-weighted diffuse scatter	`albedo` (RGB)
`Metal`	Perfect mirror reflection + fuzz	`albedo`, `fuzz` ∈ [0, 1]
`Dielectric`	Snell's law refraction + Schlick Fresnel	`ior` (e.g. glass=1.5, water=1.33, diamond=2.4)

Acceleration — BVH

BVH.h — Standard recursive BVH tree
- Split axis: longest AABB extent (surface-area-heuristic proxy)
- Build: O(n log n), Traverse: O(log n)
- Leaf nodes: single objects; bounded and unbounded objects separated
PacketBVH.h — SIMD-friendly packet BVH
- SIMDAABBBounds stores bounds as [min, max] per axis for branchless tests
- 4-ray packet traversal with active bitmask; fallback to single-ray on leaf
- intersect_aabb_packet, intersect_sphere_packet, intersect_triangle_packet, intersect_plane_packet

Threading

Component	Detail
`ThreadPool`	Fixed worker pool, condition-variable sleep (no spinlock)
`TileRenderer`	32×32 px tiles, work-stealing via `atomic<int> next_tile`
`PaddedAtomic<T>`	Per-cache-line padding (128 B on Apple Silicon, 64 B elsewhere)
Thread-local RNG	Each tile worker has its own `mt19937` — no mutex on random
P-core detection	`sysctlbyname("hw.perflevel0.physicalcpu")` — skips E-cores
QoS	`QOS_CLASS_USER_INTERACTIVE` — highest scheduling priority

SIMD — SIMD.h

Feature	Detail
Backend	ARM NEON via `GLM_FORCE_INTRINSICS` + `GLM_FORCE_DEFAULT_ALIGNED_GENTYPES`
Packet width	4 rays (`PACKET_SIZE = 4`) — matches 128-bit NEON register
Layout	Structure-of-Arrays (SoA) — `origins_x[4]`, `origins_y[4]`, …
Alignment	`alignas(16)` on all SoA arrays — NEON-ready
Precomputed	`inv_directions` + `signs` per ray — branchless AABB slab test
Diagnostics	`print_simd_diagnostics()` prints GLM arch + alignment on startup

Camera — Camera.h

Look-at model — {look_from, look_at, vup, vfov, aspect_ratio}
Depth of field — disk sampling (lens_radius = aperture / 2) with focus_dist plane
Thread-safe — thread_local RNG inside get_ray()

Rendering Pipeline

Per tile (32×32):
  For each pixel:
    For each sample (N spp):
      1. Jitter (u,v) within pixel
      2. camera.get_ray(u, v)           ← DoF offset applied
      3. trace_ray(ray, scene, depth)   ← recursive BVH traversal
         └─ material.scatter()          ← importance sampled BRDF
      4. Accumulate colour
    5. Average + gamma correct (√)
    6. Write to pixel buffer

🛠 Technologies

Technology	Version	Role
C++	C++20	Core language
GLM	Header-only	Math (vec3, mat4) + NEON SIMD
SDL2	2.x	Window creation, pixel buffer
SDL2_ttf	2.x	UI text rendering
CMake	3.20+	Build system
Google Test	Latest	39-test unit suite

🍎 Apple Silicon Optimizations

Optimization	File	Benefit
NEON SIMD via GLM	`SIMD.h`	4-wide parallel ray processing
128-byte cache line padding	`ThreadPool.h`	Eliminates false sharing on atomics
P-core detection (`hw.perflevel0.physicalcpu`)	`ThreadPool.h`	Uses performance cores only
`QOS_CLASS_USER_INTERACTIVE`	`ThreadPool.h`	Highest-priority thread scheduler slot
`__builtin_arm_yield()`	`ThreadPool.h`	Efficient spin-hint (no wasted cycles)
`-march=native -arch arm64`	`CMakeLists.txt`	Native ARM64 code generation
`-funroll-loops -ftree-vectorize`	`CMakeLists.txt`	Compiler auto-vectorisation

See docs/comprehensive-guide.md for the full breakdown.

📚 Documentation

Start here: docs/comprehensive-guide.md — Complete technical documentation including:

Full architecture overview
M1/M2/M3 Mac specific optimizations
SIMD and NEON configuration
Threading and parallelism details
BVH acceleration explained
Lessons learned and best practices

🏗 Architecture Highlights

Polymorphic Design:

class Hittable {               // Base interface
    virtual bool hit(...) = 0;
    virtual bool bounding_box(...) = 0;
};

class Sphere    : public Hittable { ... };  // Quadratic eq
class Triangle  : public Hittable { ... };  // Möller–Trumbore
class Box       : public Hittable { ... };  // Slab method
class BVHNode   : public Hittable { ... };  // Recursive tree node
class BVHScene  : public Hittable { ... };  // Bounded + unbounded split

Material System:

class Material {
    virtual bool scatter(ray_in, record, attenuation, scattered) = 0;
};

class Lambertian : public Material { ... };  // Diffuse (cosine scatter)
class Metal      : public Material { ... };  // Reflection + fuzz
class Dielectric : public Material { ... };  // Snell + Schlick Fresnel

Thread Pool (work-stealing):

// Each thread grabs tiles atomically until exhausted
int tile_idx = next_tile.fetch_add(1, seq_cst);

⚙️ Configuration

Rendering parameters are adjustable via the UI, or set defaults in src/main.cpp:

struct RenderParams {
    int num_threads       = 8;     // Thread count (auto-set to P-core count)
    int samples_per_pixel = 100;   // Anti-aliasing samples (1–1000)
    int max_depth         = 50;    // Ray bounce limit (1–200)
    float fov             = 60.0f; // Vertical field of view (degrees)
    float aperture        = 0.1f;  // Depth of field blur (0 = pinhole)
    float focus_dist      = 0.0f;  // 0 = auto (distance to look-at)
    bool use_packet_tracing = true; // SIMD packet mode on/off
};

🚀 Future Improvements

Texture mapping (image and procedural)
OBJ mesh loading
Emissive materials (area lights)
Progressive/incremental rendering
GPU acceleration (Metal API)
Denoising at low sample counts
SAH (Surface Area Heuristic) BVH split

🎉 Acknowledgments

Based on "Ray Tracing in One Weekend" series by Peter Shirley and the ray tracing community's accumulated knowledge.

Further reading:

Ray Tracing in One Weekend — Peter Shirley (FREE online)
Scratchapixel.com — Best graphics math tutorials
Physically Based Rendering — Pharr, Jakob, Humphreys

Want to understand the code? Read docs/comprehensive-guide.md for full technical documentation! 📖

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
benchmarks		benchmarks
docs		docs
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Monte Carlo noise.png		Monte Carlo noise.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPU Ray Tracer

📊 Project Metrics

Codebase

Renderer Configuration

⚡ Measured Performance — Apple M-series (ARM64, NEON SIMD)

✨ Features

🚀 Quick Start

Build and Run

Controls

Demo Scenes

📁 Project Structure

🎯 Current Status

✅ All Phases Complete!

🔬 What Works — Technical Deep Dive

Primitives

Materials

Acceleration — BVH

Threading

SIMD — SIMD.h

Camera — Camera.h

Rendering Pipeline

🛠 Technologies

🍎 Apple Silicon Optimizations

📚 Documentation

🏗 Architecture Highlights

⚙️ Configuration

🚀 Future Improvements

🎉 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CPU Ray Tracer

📊 Project Metrics

Codebase

Renderer Configuration

⚡ Measured Performance — Apple M-series (ARM64, NEON SIMD)

✨ Features

🚀 Quick Start

Build and Run

Controls

Demo Scenes

📁 Project Structure

🎯 Current Status

✅ All Phases Complete!

🔬 What Works — Technical Deep Dive

Primitives

Materials

Acceleration — BVH

Threading

SIMD — SIMD.h

Camera — Camera.h

Rendering Pipeline

🛠 Technologies

🍎 Apple Silicon Optimizations

📚 Documentation

🏗 Architecture Highlights

⚙️ Configuration

🚀 Future Improvements

🎉 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages