diff --git a/Performance Analysis.xlsx b/Performance Analysis.xlsx new file mode 100644 index 00000000..e54fa587 Binary files /dev/null and b/Performance Analysis.xlsx differ diff --git a/README.md b/README.md index 110697ce..e3c53352 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,109 @@ CUDA Path Tracer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Yuxuan Zhu + * [LinkedIn](https://www.linkedin.com/in/andrewyxzhu/) +* Tested on: Windows 10, i7-7700HQ @ 2.80GHz 16GB, GTX 1050 4096MB (Personal Laptop) + +## Demo + +![Demo](img/final.png) + +The gltf file is obtained from https://github.com/KhronosGroup/glTF-Sample-Models/tree/master/2.0/Duck + +## Introduction + +This is a GPU based image renderer with path tracing capability. Path tracing is the process of shooting rays from camera and updating the color of the rays as they bounce around the scene. Path tracing simulates how light travels in real life and enables realitic image rendering. Since each ray shot from the camera is independent from each other. Path tracing is embarassingly parallel and can be done with CUDA. + +Feature highlight: +1. Diffusion +2. Reflection +3. Refraction +4. Anti-aliasing +5. Depth of field effect +6. Arbitrary mesh loading +7. First bounce caching +8. Sorting by material + +**Control Options** +1. Toggle 1 to enable/disable material sorting +2. Toggle 2 to enable/disable first bounce caching +3. Toggle 3 to enable/disable stochastic sampled anti-aliasing +4. Toggle 4 to enable/disable depth of field effect +5. Toggle 5 to enable/disable bounding volume intersection culling + +## Features +**Diffusion/Reflection** + +Diffusion allows the illumination of rough objects like walls. Reflection simulates the effects of mirrors and shiny objects. + +![annotatedFeatures](img/annotatedFeatures.png) + + +**Refraction** + +Refraction simulates the effects of translucent materials such as glass. Light will bend when it enters and exits these materials. When light enters from a medium with high index of refraction to a medium of low index of refraction, beyond a certain angle light will have total internal reflection instead of refraction. I implemented refraction with support for total internal reflection. Adding this feature will slow down the kernel because of the additional calculations for refraction angle, reflectance coefficient etc. This feature can take advantage of the parallel processing of GPU since the rays are independent. + +The top image is before refraction is applied. The bottom one is after. + + +![BeforeRefraction](img/beforeRefraction.png) +![AfterRefraction](img/afterRefraction.png) + +**Anti-aliasing** + +Aliasing is the problem of artefacts on images as a result of discretization of pixels. This can be mitigated by taking more than one sample per pixel from regions around the center of the pixel. By performing stochastic sampled anti-aliasing, we are use a higher number of iterations to produce a sharper image. Stochastic sampled anti-aliasing definitely benefits from being implemented on the GPU because the rays generated for each pixel are always independent. This might be further optimized by launching more threads to per pixel. The exact effect cannot be known unless it is tried out. This depends on whether launching extra threads will help mask memory latency by doing more computations. Below is the comparison for 2000 iterations. + +The top image is before anti-aliasing is applied. The bottom one is after. + +![BeforeAA](img/beforeAA.JPG) +![AfterAA](img/afterAA.JPG) + + +**Depth of Field** + +Depth of field creates the thin lens effect of a real life camera as opposed to a pinhole camera. Objects in focus will be sharp while objects out of focus will be blurry. Depth of field significantly increases the number of iterations required to render an image since we need to shoot rays from not just a pinhole but rather the whole area of the circular lens. Depth of field definitely benefits from being implemented on the GPU because the rays generated are always independent. This might be further optimized by launching more threads to handle rays generated from different camera origins. The exact effect cannot be known unless it is tried out. This depends on whether launching extra threads will help mask memory latency by doing more computations. + +The top image is before depth of field is applied. The bottom one is after. + +![Before](img/beforeDof.png) +![After](img/afterDof.png) + +**Meshloading** + +Meshloading enables the loading of any scenes recorded in the gltf format. Gltf is a compact format designed for scene tranmission. I used the tinygltf library to help load the scene. However, significant effort is still required to parse the scene struct for use with my path tracer. I also referenced https://github.com/syoyo/tinygltf/issues/71 for examples on parsing the format. Meshloading will signifcantly slow down the path tracer becaues the mesh involves a high number of triangles and the data structure is not optimized for a high number of geometries. This functionality can only be implemented on the CPU since it involves IO. This can be further optimized by using a hierarchical spatial data structure such as octree. I ran out of time so I did not implement this optimization. I implemented bounding volume intersection culling. For bounding volume intersection culling, I used a sphere to bound the object rather than a cube, since it is easier and faster. +The tradeoff is the volume contained by the cube is larger. I did a performance analysis to show the benefit of culling. + +![mesh](img/meshloading.png) + +![cullingAnalysis](img/cullingAnalysis.png) + +**First Bounce Caching** + +If the rays are generated deterministically, then the first itersection of the rays with the objects in the scene will always be the same. Instead of recalculating this first intersection at every iteration, we can cache this first intersection to speed up the path tracing process. However, there will be diminishing return as the number of bounces for each iteration increases. First bounce caching is also not needed if there is any randomness in the ray generation process. The graph below is the performance analysis. + +![cachingAnalysis](img/cachingAnalysis.png) + +**Sorting Rays by Material** + +Warp divergence can seriously impact the performance of GPUs. So it is paramount to ensure the rays in each warp are doing similar calculations. Since different materials will result in differnt bouncing and shading behavior, we need to sort the rays and intersections by material to make rays hitting the same materials to be contigous in memory. There are different ways of sorting. Custom sorting in thrust is done with merge sort, which is slower than radix sort. In order to accelerate, I changed the data +structure to allow radix sort. I seperated out material ID from the material struct. I changed the material ID of terminating rays to a positive number as well since I am not sure whether thrust radix sort supports negative numbers. + +I compared three different implementations of material sorting +1. key value sort with custom comparator (fast) +2. key value sort by zip iterator with one material ID buffer (slow) +3. key value sort with duplicate material ID buffer (fastest) + +This shows that extra operations can sometimes speed up GPU programs. The tradeoff is that I am using one extra buffer which take up a bit more space. + +Even though sorting by material should theoretically speed up the path tracer, the effect is not very obvious and most of the time the sorting makes the render slower. This is because there is some randomness with choosing reflection and refraction which made sorting less useful since there could be divergence for the same material. + + +## Bloopers + +![Blooper](img/blooper.png) + +This circular pattern was accidently created because of floating point errors. -### (TODO: Your README) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. diff --git a/external/include/json.hpp b/external/include/json.hpp new file mode 100644 index 00000000..c9af0bed --- /dev/null +++ b/external/include/json.hpp @@ -0,0 +1,20406 @@ +/* + __ _____ _____ _____ + __| | __| | | | JSON for Modern C++ +| | |__ | | | | | | version 3.5.0 +|_____|_____|_____|_|___| https://github.com/nlohmann/json + +Licensed under the MIT License . +SPDX-License-Identifier: MIT +Copyright (c) 2013-2018 Niels Lohmann . + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +*/ + +#ifndef NLOHMANN_JSON_HPP +#define NLOHMANN_JSON_HPP + +#define NLOHMANN_JSON_VERSION_MAJOR 3 +#define NLOHMANN_JSON_VERSION_MINOR 5 +#define NLOHMANN_JSON_VERSION_PATCH 0 + +#include // all_of, find, for_each +#include // assert +#include // and, not, or +#include // nullptr_t, ptrdiff_t, size_t +#include // hash, less +#include // initializer_list +#include // istream, ostream +#include // random_access_iterator_tag +#include // accumulate +#include // string, stoi, to_string +#include // declval, forward, move, pair, swap + +// #include +#ifndef NLOHMANN_JSON_FWD_HPP +#define NLOHMANN_JSON_FWD_HPP + +#include // int64_t, uint64_t +#include // map +#include // allocator +#include // string +#include // vector + +/*! +@brief namespace for Niels Lohmann +@see https://github.com/nlohmann +@since version 1.0.0 +*/ +namespace nlohmann +{ +/*! +@brief default JSONSerializer template argument + +This serializer ignores the template arguments and uses ADL +([argument-dependent lookup](https://en.cppreference.com/w/cpp/language/adl)) +for serialization. +*/ +template +struct adl_serializer; + +template class ObjectType = + std::map, + template class ArrayType = std::vector, + class StringType = std::string, class BooleanType = bool, + class NumberIntegerType = std::int64_t, + class NumberUnsignedType = std::uint64_t, + class NumberFloatType = double, + template class AllocatorType = std::allocator, + template class JSONSerializer = + adl_serializer> +class basic_json; + +/*! +@brief JSON Pointer + +A JSON pointer defines a string syntax for identifying a specific value +within a JSON document. It can be used with functions `at` and +`operator[]`. Furthermore, JSON pointers are the base for JSON patches. + +@sa [RFC 6901](https://tools.ietf.org/html/rfc6901) + +@since version 2.0.0 +*/ +template +class json_pointer; + +/*! +@brief default JSON class + +This type is the default specialization of the @ref basic_json class which +uses the standard template types. + +@since version 1.0.0 +*/ +using json = basic_json<>; +} // namespace nlohmann + +#endif + +// #include + + +// This file contains all internal macro definitions +// You MUST include macro_unscope.hpp at the end of json.hpp to undef all of them + +// exclude unsupported compilers +#if !defined(JSON_SKIP_UNSUPPORTED_COMPILER_CHECK) + #if defined(__clang__) + #if (__clang_major__ * 10000 + __clang_minor__ * 100 + __clang_patchlevel__) < 30400 + #error "unsupported Clang version - see https://github.com/nlohmann/json#supported-compilers" + #endif + #elif defined(__GNUC__) && !(defined(__ICC) || defined(__INTEL_COMPILER)) + #if (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) < 40800 + #error "unsupported GCC version - see https://github.com/nlohmann/json#supported-compilers" + #endif + #endif +#endif + +// disable float-equal warnings on GCC/clang +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #pragma GCC diagnostic push + #pragma GCC diagnostic ignored "-Wfloat-equal" +#endif + +// disable documentation warnings on clang +#if defined(__clang__) + #pragma GCC diagnostic push + #pragma GCC diagnostic ignored "-Wdocumentation" +#endif + +// allow for portable deprecation warnings +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #define JSON_DEPRECATED __attribute__((deprecated)) +#elif defined(_MSC_VER) + #define JSON_DEPRECATED __declspec(deprecated) +#else + #define JSON_DEPRECATED +#endif + +// allow to disable exceptions +#if (defined(__cpp_exceptions) || defined(__EXCEPTIONS) || defined(_CPPUNWIND)) && !defined(JSON_NOEXCEPTION) + #define JSON_THROW(exception) throw exception + #define JSON_TRY try + #define JSON_CATCH(exception) catch(exception) + #define JSON_INTERNAL_CATCH(exception) catch(exception) +#else + #define JSON_THROW(exception) std::abort() + #define JSON_TRY if(true) + #define JSON_CATCH(exception) if(false) + #define JSON_INTERNAL_CATCH(exception) if(false) +#endif + +// override exception macros +#if defined(JSON_THROW_USER) + #undef JSON_THROW + #define JSON_THROW JSON_THROW_USER +#endif +#if defined(JSON_TRY_USER) + #undef JSON_TRY + #define JSON_TRY JSON_TRY_USER +#endif +#if defined(JSON_CATCH_USER) + #undef JSON_CATCH + #define JSON_CATCH JSON_CATCH_USER + #undef JSON_INTERNAL_CATCH + #define JSON_INTERNAL_CATCH JSON_CATCH_USER +#endif +#if defined(JSON_INTERNAL_CATCH_USER) + #undef JSON_INTERNAL_CATCH + #define JSON_INTERNAL_CATCH JSON_INTERNAL_CATCH_USER +#endif + +// manual branch prediction +#if defined(__clang__) || defined(__GNUC__) || defined(__GNUG__) + #define JSON_LIKELY(x) __builtin_expect(!!(x), 1) + #define JSON_UNLIKELY(x) __builtin_expect(!!(x), 0) +#else + #define JSON_LIKELY(x) x + #define JSON_UNLIKELY(x) x +#endif + +// C++ language standard detection +#if (defined(__cplusplus) && __cplusplus >= 201703L) || (defined(_HAS_CXX17) && _HAS_CXX17 == 1) // fix for issue #464 + #define JSON_HAS_CPP_17 + #define JSON_HAS_CPP_14 +#elif (defined(__cplusplus) && __cplusplus >= 201402L) || (defined(_HAS_CXX14) && _HAS_CXX14 == 1) + #define JSON_HAS_CPP_14 +#endif + +/*! +@brief macro to briefly define a mapping between an enum and JSON +@def NLOHMANN_JSON_SERIALIZE_ENUM +@since version 3.4.0 +*/ +#define NLOHMANN_JSON_SERIALIZE_ENUM(ENUM_TYPE, ...) \ + template \ + inline void to_json(BasicJsonType& j, const ENUM_TYPE& e) \ + { \ + static_assert(std::is_enum::value, #ENUM_TYPE " must be an enum!"); \ + static const std::pair m[] = __VA_ARGS__; \ + auto it = std::find_if(std::begin(m), std::end(m), \ + [e](const std::pair& ej_pair) -> bool \ + { \ + return ej_pair.first == e; \ + }); \ + j = ((it != std::end(m)) ? it : std::begin(m))->second; \ + } \ + template \ + inline void from_json(const BasicJsonType& j, ENUM_TYPE& e) \ + { \ + static_assert(std::is_enum::value, #ENUM_TYPE " must be an enum!"); \ + static const std::pair m[] = __VA_ARGS__; \ + auto it = std::find_if(std::begin(m), std::end(m), \ + [j](const std::pair& ej_pair) -> bool \ + { \ + return ej_pair.second == j; \ + }); \ + e = ((it != std::end(m)) ? it : std::begin(m))->first; \ + } + +// Ugly macros to avoid uglier copy-paste when specializing basic_json. They +// may be removed in the future once the class is split. + +#define NLOHMANN_BASIC_JSON_TPL_DECLARATION \ + template class ObjectType, \ + template class ArrayType, \ + class StringType, class BooleanType, class NumberIntegerType, \ + class NumberUnsignedType, class NumberFloatType, \ + template class AllocatorType, \ + template class JSONSerializer> + +#define NLOHMANN_BASIC_JSON_TPL \ + basic_json + +// #include + + +#include // not +#include // size_t +#include // conditional, enable_if, false_type, integral_constant, is_constructible, is_integral, is_same, remove_cv, remove_reference, true_type + +namespace nlohmann +{ +namespace detail +{ +// alias templates to reduce boilerplate +template +using enable_if_t = typename std::enable_if::type; + +template +using uncvref_t = typename std::remove_cv::type>::type; + +// implementation of C++14 index_sequence and affiliates +// source: https://stackoverflow.com/a/32223343 +template +struct index_sequence +{ + using type = index_sequence; + using value_type = std::size_t; + static constexpr std::size_t size() noexcept + { + return sizeof...(Ints); + } +}; + +template +struct merge_and_renumber; + +template +struct merge_and_renumber, index_sequence> + : index_sequence < I1..., (sizeof...(I1) + I2)... > {}; + +template +struct make_index_sequence + : merge_and_renumber < typename make_index_sequence < N / 2 >::type, + typename make_index_sequence < N - N / 2 >::type > {}; + +template<> struct make_index_sequence<0> : index_sequence<> {}; +template<> struct make_index_sequence<1> : index_sequence<0> {}; + +template +using index_sequence_for = make_index_sequence; + +// dispatch utility (taken from ranges-v3) +template struct priority_tag : priority_tag < N - 1 > {}; +template<> struct priority_tag<0> {}; + +// taken from ranges-v3 +template +struct static_const +{ + static constexpr T value{}; +}; + +template +constexpr T static_const::value; +} // namespace detail +} // namespace nlohmann + +// #include + + +#include // not +#include // numeric_limits +#include // false_type, is_constructible, is_integral, is_same, true_type +#include // declval + +// #include + +// #include + + +#include // random_access_iterator_tag + +// #include + + +namespace nlohmann +{ +namespace detail +{ +template struct make_void +{ + using type = void; +}; +template using void_t = typename make_void::type; +} // namespace detail +} // namespace nlohmann + +// #include + + +namespace nlohmann +{ +namespace detail +{ +template +struct iterator_types {}; + +template +struct iterator_types < + It, + void_t> +{ + using difference_type = typename It::difference_type; + using value_type = typename It::value_type; + using pointer = typename It::pointer; + using reference = typename It::reference; + using iterator_category = typename It::iterator_category; +}; + +// This is required as some compilers implement std::iterator_traits in a way that +// doesn't work with SFINAE. See https://github.com/nlohmann/json/issues/1341. +template +struct iterator_traits +{ +}; + +template +struct iterator_traits < T, enable_if_t < !std::is_pointer::value >> + : iterator_types +{ +}; + +template +struct iterator_traits::value>> +{ + using iterator_category = std::random_access_iterator_tag; + using value_type = T; + using difference_type = ptrdiff_t; + using pointer = T*; + using reference = T&; +}; +} +} + +// #include + +// #include + + +#include + +// #include + + +// http://en.cppreference.com/w/cpp/experimental/is_detected +namespace nlohmann +{ +namespace detail +{ +struct nonesuch +{ + nonesuch() = delete; + ~nonesuch() = delete; + nonesuch(nonesuch const&) = delete; + void operator=(nonesuch const&) = delete; +}; + +template class Op, + class... Args> +struct detector +{ + using value_t = std::false_type; + using type = Default; +}; + +template class Op, class... Args> +struct detector>, Op, Args...> +{ + using value_t = std::true_type; + using type = Op; +}; + +template