A from-scratch CPU-based ray tracer implementation in C++ with clean architecture and extensible design, optimized for Apple Silicon (M1/M2/M3) Macs.
| Metric | Value |
|---|---|
| Total Source Lines | 5,002 lines |
| Source Files | 17 headers + 1 main.cpp |
| Test Lines | 481 lines |
| Unit Tests | 39 tests (Google Test) |
| Total Source Size | ~149 KB |
| C++ Standard | C++20 |
| Metric | Value |
|---|---|
| Render Resolution | 960 Γ 800 px |
| Tile Size | 32 Γ 32 px β 750 tiles / frame |
| Default Samples/Pixel | 100 (configurable 1β1000) |
| Max Ray Depth | 50 (configurable 1β200) |
| Ray Packet Width | 4 rays (128-bit NEON register) |
| BVH Complexity | O(log n) traversal |
| Demo Scenes | 12 (4 β 153 objects) |
| Max Thread Count | 16 P-cores (Apple Silicon) |
| Rays Per Frame (default) | 76.8 M (960 Γ 800 Γ 100 spp) |
Benchmarked on Apple Silicon (4 P-cores, 10 total threads) at 960Γ800 with BVH enabled.
Thread Scaling β Scene 9 (79 random spheres), 50 spp:
| Threads | Render Time | Throughput | Speedup | Efficiency |
|---|---|---|---|---|
| 1 thread | 7.59 s | 5.06 M rays/s | 1.0Γ | 100% |
| 2 threads | 4.29 s | 8.95 M rays/s | 1.8Γ | 89% |
| 4 P-cores | 3.24 s | 11.84 M rays/s | 2.3Γ | 58% |
Scene Complexity β 4 P-cores, 50 spp:
| Scene | Objects | BVH Nodes | Time | Throughput | vs Simple |
|---|---|---|---|---|---|
| Basic Spheres | 3 + 1 plane | 3 nodes | 2.90 s | 13.22 M/s | baseline |
| Classic 3-Sphere | 3 + 1 plane | 3 nodes | 3.07 s | 12.50 M/s | 1.06Γ |
| Mixed Primitives | 3 + 2 planes | 3 nodes | 6.57 s | 5.84 M/s | 2.26Γ |
| Random Spheres | 79 + 1 plane | 93 nodes | 3.18 s | 12.09 M/s | 1.09Γ |
| Sphere Grid | 121 + 1 plane | 127 nodes | 2.99 s | 12.84 M/s | 1.03Γ |
| Mixed Large | 153 + 1 plane | 177 nodes | 3.29 s | 11.69 M/s | 1.13Γ |
β οΈ Mixed Primitives is slower because it has 2 unbounded planes β infinite planes bypass the BVH and are tested against every ray individually. The BVH itself scales O(log n) β 3 β 153 objects is only 13% overhead.
Sample Scaling β Classic 3-Sphere, 4 P-cores:
| Samples/px | Render Time | Throughput |
|---|---|---|
| 10 spp | 0.61 s | 12.53 M rays/s |
| 50 spp | 3.11 s | 12.35 M rays/s |
| 100 spp | 6.18 s | 12.43 M rays/s |
| 250 spp | 15.45 s | 12.43 M rays/s |
Throughput is constant at ~12.4 M rays/s regardless of sample count β confirms linear O(spp) scaling with zero overhead per additional sample.
All Phases Complete:
- β Multiple geometric primitives (Sphere, Plane, Triangle, Box)
- β Configurable look-at camera with FOV control and depth of field
- β Polymorphic architecture for easy extensibility
- β Material system (Lambertian, Metal, Dielectric)
- β Recursive ray tracing with realistic lighting
- β Anti-aliasing via multi-sampling with gamma correction
- β Multi-threaded tile-based rendering with work-stealing thread pool
- β BVH acceleration structure for O(log n) ray-scene intersection
- β SIMD-optimized ray packet tracing using ARM NEON via GLM
- β Apple Silicon optimizations (P-core detection, 128-byte cache lines, QoS)
- β Interactive UI with 12 demo scenes
mkdir build && cd build
cmake ..
make
./MyExecutable- R: Render current scene
- 1β12: Switch between demo scenes
- ESC: Exit application
| # | Scene | Objects | Description |
|---|---|---|---|
| 1 | Basic Spheres | 5 | Simple diffuse materials |
| 2 | Metal Spheres | 4 | Reflective surfaces |
| 3 | Classic 3-Sphere | 4 | Metal + diffuse mix |
| 4 | Pyramid | 5 | Triangle primitives |
| 5 | Boxes | 4 | AABB primitives |
| 6 | Mixed Primitives | 5 | All geometry types |
| 7 | Glass Spheres | 5 | Dielectric materials |
| 8 | All Materials | 4 | Diffuse, metal, glass |
| 9 | Random Spheres | 100+ | BVH stress test |
| 10 | Sphere Grid | 121 | Uniform distribution |
| 11 | Cornell Box | 8 | Classic lighting test |
| 12 | Mixed Large | 150+ | All primitives at scale |
RayTracer/
βββ src/
β βββ main.cpp # Entry point, render loop, UI (534 lines)
β βββ SIMD.h # SIMD config + ray packets (871 lines)
β βββ SceneFactory.h # Pre-built demo scenes (597 lines)
β βββ UI.h # Immediate-mode UI (504 lines)
β βββ PacketBVH.h # SIMD-optimized packet BVH (444 lines)
β βββ ThreadPool.h # Thread pool + tile renderer (343 lines)
β βββ BVH.h # Standard BVH acceleration (263 lines)
β βββ Material.h # Lambertian, Metal, Dielectric (142 lines)
β βββ AABB.h # Axis-Aligned Bounding Box (145 lines)
β βββ Camera.h # Look-at camera with DOF (76 lines)
β βββ Scene.h # Scene container (95 lines)
β βββ Hittable.h # Abstract base + HitRecord (66 lines)
β βββ Primitives:
β βββ Sphere.h # Quadratic intersection (91 lines)
β βββ Plane.h # Dot product intersection (133 lines)
β βββ Triangle.h # MΓΆllerβTrumbore algorithm (87 lines)
β βββ Box.h # Slab method intersection (105 lines)
β βββ Ray.h # Ray struct (25 lines)
β
βββ docs/
β βββ comprehensive-guide.md # β Full technical docs + M1 optimizations
β βββ README.md # Documentation index
β βββ next-steps.md # Future features roadmap
β βββ primitives-reference.md # Math & algorithms
β βββ camera-and-architecture.md
β
βββ tests/
βββ test_raytracer.cpp # 39 Google Test cases (481 lines)
Phase 1 & 2 β Geometry & Architecture
- Multiple primitive types (Sphere, Plane, Triangle, Box)
- Polymorphic architecture with smart pointers
- Camera system with configurable FOV and look-at targeting
Phase 3 β Materials & Lighting
- Lambertian (diffuse), Metal (reflective), Dielectric (glass)
- Recursive ray tracing with Schlick's Fresnel approximation
Phase 4 β Image Quality
- Anti-aliasing via multi-sampling (jittered)
- Gamma correction (Ξ³ = 2.0, i.e. β per channel)
- Depth of field (aperture lens blur)
Phase 5 β Optimization β¨
- Multi-threaded tile-based rendering with work-stealing
- BVH acceleration structure β O(log n) intersection, longest-axis split
- SIMD-optimized ray packet tracing (4-wide, SoA layout)
- Apple Silicon optimizations (P-core detection, cache alignment, QoS)
| Primitive | Algorithm | Bounding Box |
|---|---|---|
Sphere |
Quadratic equation (discriminant test) | center Β± radius |
Plane |
Dot product slab β t = (d - dot(origin, n)) / dot(dir, n) |
Unbounded (skips BVH) |
Triangle |
MΓΆllerβTrumbore (barycentric coords, zero alloc) | min/max of 3 vertices + pad |
Box |
Slab method β 3 axis-aligned pairs | Min/max corner pair |
| Material | Model | Key Parameters |
|---|---|---|
Lambertian |
Cosine-weighted diffuse scatter | albedo (RGB) |
Metal |
Perfect mirror reflection + fuzz | albedo, fuzz β [0, 1] |
Dielectric |
Snell's law refraction + Schlick Fresnel | ior (e.g. glass=1.5, water=1.33, diamond=2.4) |
BVH.hβ Standard recursive BVH tree- Split axis: longest AABB extent (surface-area-heuristic proxy)
- Build: O(n log n), Traverse: O(log n)
- Leaf nodes: single objects; bounded and unbounded objects separated
PacketBVH.hβ SIMD-friendly packet BVHSIMDAABBBoundsstores bounds as[min, max]per axis for branchless tests- 4-ray packet traversal with active bitmask; fallback to single-ray on leaf
intersect_aabb_packet,intersect_sphere_packet,intersect_triangle_packet,intersect_plane_packet
| Component | Detail |
|---|---|
ThreadPool |
Fixed worker pool, condition-variable sleep (no spinlock) |
TileRenderer |
32Γ32 px tiles, work-stealing via atomic<int> next_tile |
PaddedAtomic<T> |
Per-cache-line padding (128 B on Apple Silicon, 64 B elsewhere) |
| Thread-local RNG | Each tile worker has its own mt19937 β no mutex on random |
| P-core detection | sysctlbyname("hw.perflevel0.physicalcpu") β skips E-cores |
| QoS | QOS_CLASS_USER_INTERACTIVE β highest scheduling priority |
SIMD β SIMD.h
| Feature | Detail |
|---|---|
| Backend | ARM NEON via GLM_FORCE_INTRINSICS + GLM_FORCE_DEFAULT_ALIGNED_GENTYPES |
| Packet width | 4 rays (PACKET_SIZE = 4) β matches 128-bit NEON register |
| Layout | Structure-of-Arrays (SoA) β origins_x[4], origins_y[4], β¦ |
| Alignment | alignas(16) on all SoA arrays β NEON-ready |
| Precomputed | inv_directions + signs per ray β branchless AABB slab test |
| Diagnostics | print_simd_diagnostics() prints GLM arch + alignment on startup |
Camera β Camera.h
- Look-at model β
{look_from, look_at, vup, vfov, aspect_ratio} - Depth of field β disk sampling (
lens_radius = aperture / 2) withfocus_distplane - Thread-safe β
thread_localRNG insideget_ray()
Per tile (32Γ32):
For each pixel:
For each sample (N spp):
1. Jitter (u,v) within pixel
2. camera.get_ray(u, v) β DoF offset applied
3. trace_ray(ray, scene, depth) β recursive BVH traversal
ββ material.scatter() β importance sampled BRDF
4. Accumulate colour
5. Average + gamma correct (β)
6. Write to pixel buffer
| Technology | Version | Role |
|---|---|---|
| C++ | C++20 | Core language |
| GLM | Header-only | Math (vec3, mat4) + NEON SIMD |
| SDL2 | 2.x | Window creation, pixel buffer |
| SDL2_ttf | 2.x | UI text rendering |
| CMake | 3.20+ | Build system |
| Google Test | Latest | 39-test unit suite |
| Optimization | File | Benefit |
|---|---|---|
| NEON SIMD via GLM | SIMD.h |
4-wide parallel ray processing |
| 128-byte cache line padding | ThreadPool.h |
Eliminates false sharing on atomics |
P-core detection (hw.perflevel0.physicalcpu) |
ThreadPool.h |
Uses performance cores only |
QOS_CLASS_USER_INTERACTIVE |
ThreadPool.h |
Highest-priority thread scheduler slot |
__builtin_arm_yield() |
ThreadPool.h |
Efficient spin-hint (no wasted cycles) |
-march=native -arch arm64 |
CMakeLists.txt |
Native ARM64 code generation |
-funroll-loops -ftree-vectorize |
CMakeLists.txt |
Compiler auto-vectorisation |
See docs/comprehensive-guide.md for the full breakdown.
Start here: docs/comprehensive-guide.md β Complete technical documentation including:
- Full architecture overview
- M1/M2/M3 Mac specific optimizations
- SIMD and NEON configuration
- Threading and parallelism details
- BVH acceleration explained
- Lessons learned and best practices
Other documentation in docs/:
- docs/primitives-reference.md β Mathematical details & algorithms
- docs/next-steps.md β Future features roadmap
- docs/camera-and-architecture.md β System design & patterns
Polymorphic Design:
class Hittable { // Base interface
virtual bool hit(...) = 0;
virtual bool bounding_box(...) = 0;
};
class Sphere : public Hittable { ... }; // Quadratic eq
class Triangle : public Hittable { ... }; // MΓΆllerβTrumbore
class Box : public Hittable { ... }; // Slab method
class BVHNode : public Hittable { ... }; // Recursive tree node
class BVHScene : public Hittable { ... }; // Bounded + unbounded splitMaterial System:
class Material {
virtual bool scatter(ray_in, record, attenuation, scattered) = 0;
};
class Lambertian : public Material { ... }; // Diffuse (cosine scatter)
class Metal : public Material { ... }; // Reflection + fuzz
class Dielectric : public Material { ... }; // Snell + Schlick FresnelThread Pool (work-stealing):
// Each thread grabs tiles atomically until exhausted
int tile_idx = next_tile.fetch_add(1, seq_cst);Rendering parameters are adjustable via the UI, or set defaults in src/main.cpp:
struct RenderParams {
int num_threads = 8; // Thread count (auto-set to P-core count)
int samples_per_pixel = 100; // Anti-aliasing samples (1β1000)
int max_depth = 50; // Ray bounce limit (1β200)
float fov = 60.0f; // Vertical field of view (degrees)
float aperture = 0.1f; // Depth of field blur (0 = pinhole)
float focus_dist = 0.0f; // 0 = auto (distance to look-at)
bool use_packet_tracing = true; // SIMD packet mode on/off
};- Texture mapping (image and procedural)
- OBJ mesh loading
- Emissive materials (area lights)
- Progressive/incremental rendering
- GPU acceleration (Metal API)
- Denoising at low sample counts
- SAH (Surface Area Heuristic) BVH split
Based on "Ray Tracing in One Weekend" series by Peter Shirley and the ray tracing community's accumulated knowledge.
Further reading:
- Ray Tracing in One Weekend β Peter Shirley (FREE online)
- Scratchapixel.com β Best graphics math tutorials
- Physically Based Rendering β Pharr, Jakob, Humphreys
Want to understand the code? Read docs/comprehensive-guide.md for full technical documentation! π