CIS565-Fall-2021 · 7DBW13 · Nov 2, 2021 · Nov 2, 2021 · Nov 2, 2021 · Nov 3, 2021
diff --git a/README.md b/README.md
@@ -3,26 +3,126 @@ WebGL Forward+ and Clustered Deferred Shading
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) **Google Chrome 222.2** on
-  Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Bowen Deng
+  * [LinkedIn](www.linkedin.com/in/bowen-deng-7dbw13)
+* Tested on: Windows 10, AMD Ryzen 9 5900HX with Radeon Graphics @ 3.30GHz 16GB, GeForce RTX 3070 Laptop GPU 8GB (Personal Computer)
 
-### Live Online
+## Live Online
 
-[![](img/thumb.png)](http://TODO.github.io/Project5-WebGL-Forward-Plus-and-Clustered-Deferred)
+[View live demo](https://7dbw13.github.io/WebGL-Forward-Plus-and-Clustered-Deferred/)
 
-### Demo Video/GIF
+## Demo GIF
 
-[![](img/video.png)](TODO)
+![](img/represent.gif)
 
-### (TODO: Your README)
+## Abstract
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+Forward+ and Clustered Deferred Shading implemented in WebGL. Both of them are effient rendering techniques, which modify the classical graphical pipeline to achieve even real-time rendering. The deferred Blinn-Phong shading and a g-buffer optimization are assembled. Performance analysis is also provided, where a classical forward shading is used as a baseline.
 
-This assignment has a considerable amount of performance analysis compared
-to implementation work. Complete the implementation early to leave time!
+## Modern Rendering Methods
 
+### Deferred Shading
+
+Starting from a forward shading pipeline like our baseline, the pseudocode looks like this:
+```
+for object in scene:
+    do shading for all lights on object
+```
+A big problem is that, when there are many overlapped objects in the scene, forward shading renders the occluded ones unnecessarily.
+
+The insight of deferred shading is to decouple the process of determining if a object is in the final image from the shading of objects. Since loading the objects may be considerable costly, we can record all information needed for shading during the first peocess in geometry buffers (G-buffers), and just read these buffers in the shading phage.
+```
+for object in scene:
+    record information of object in G-buffer
+
+read G-buffer and do shading for all lights
+```
+
+The information stored in G-buffers varies from implementation to implementation. In our shader, only position, normal and albedo are recorded, which are enough for basic Lambertian or Blinn-Phong shading.
+
+| position | normal | albedo |
+| ------------------------ | ------------------------ | ----------------------- |
+| ![](img/pos.png) | ![](img/norm.png) | ![](img/col.png) |
+
+### Forward+ and Clustered Shading
+
+Another problem of the original forward pipeline is that for each object, it computes influence from all lights. However, since the influence of a light decreases as the distance gets larger, its actual influencing area is limited. For example, spheres for point lights. Following the idea of mesh grids, we can split the frustum of camera into small clusters. Objects in some clusters are only influenced by lights in those ones.
+```
+assign lights to clusters
+
+do shading for lights in corresponding clusters of objects
+```
+
+There are many ways to perform the division of frustum. In my implementation it is simply Uniform NDC, which uniformly divides clusters in NDC space, and of course the division will be uneven after transferred to world space. The divided frustum is something like follow.
+
+![](img/divide.png)
+
+(I tried to implement the division of Tiago Sousa’s DOOM 2016 Siggraph work mentioned in http://www.aortiz.me/2018/12/21/CG.html#comparing-algorithms, but the result seemed not as expected. A comment is leaved in my Forward+ shader.)
+
+For the lights assignment, I follow the idea credit by Janine's work in https://github.com/j9liu/Project5-WebGL-Forward-Plus-and-Clustered-Deferred. Instead of transfering every cluster into world space and checking if they intersects with sphere shaped lights, it is easier and more effient to transfer bounding box of every light into NDC space and see which clusters they occupy. The math of transformation is shown as follow.
+
+![](img/trans.png)
+
+The idea of cluster shading can also be applied to deferred shading, resulting to our clustered deferred shading.
+
+## Performance Analysis
+
+### Different Rendering Methods
+
+![](img/method.png)
+
+The first thing to be noticed is that Forward is outperformed by Forward+ and Clustered Deferred with any number of lights, as we expected. Since it considers all lights in the scene, its performance drops a lot when the number of lights is very large.
+
+As shown in figure, Forward+ performs better than Clustered Deferred at the beginning, but as the number of lights is large enough, it is surpassed. Theoretically, Clustered Deferred should be a better method since it eliminates the overdraw of occluded objects. However, due to utilization of g-buffers, it has a high memory bandwidth. It is known that the memory latency can be hided someway, but in a case that the computation load is low, for example small number of lights, notable time may be taken to wait for memory. An optimization to achieve smaller g-buffers is provided in later parts, and how reducing memory bandwidth benefits for performance can be observed.
+
+Last, a strange phenomenon is that the performance of Forward+ drops more dramastically than Forward sometimes. I guess it is because that we use Uniform NDC for cluster division, which leads to uneven slices along z-axis. Too many lights falls into clusters with same z coordinate. And cluster division itself brings additional overheads. The result could be better if I have more time to apply a better division or finetune the parameters.
+
+### Deferred Blinn-Phong Shading
+
+Blinn-Phong shading is a very famous shading method. It adds a specular term to the classical Lambertian model (diffuse + ambient).
+
+![](img/phong.png)
+```
+specular_term = max(pow(dot(H, N), shininess), 0)
+```
+
+Here `H` is halfway between the view vector and the light direction. `shininess` controls how diffuse the highlight is, with smaller values being more diffuse.
+
+| Lambertian | Blinn-Phong |
+| ------------------------ | ------------------------ |
+| ![](img/lamb.png) | ![](img/phong2.png) |
+
+Look at the floor of the right figure, a highlight can be observed.
+
+The cost of Blinn-Phong is trivial since it only adds one additional step for specular computation. The different from Lambertian can only be obvious as the number of lights is large enough. Here is a comparison in runtime with 1000 lights.
+
+| Lambertian | Blinn-Phong |
+| ------------------------ | ------------------------ |
+| 59ms | 77ms |
+
+### G-buffer Optimization
+
+To reduce the size of G-buffers, 2-component normals are used following the idea in this paper https://jcgt.org/published/0003/02/01/paper.pdf. The main idea is that we can map a sphere to an octahedron, project down into the z = 0 plane, and then reflect the −z-hemisphere over the appropriate diagonal.
+
+![](img/oct.png)
+
+The pseudo code is also provided in the paper. In this way, the original 3-dimension normals can be encoded to 2-component codes. The reconstructed normals are shown as follow.
+
+| Original Normals | Reconstructed Normals |
+| ------------------------ | ------------------------ |
+| ![](img/norm.png) | ![](img/norm2.png) |
+
+It can be observed that although the details, like gaps between bricks, are not recovered perfectly, the result is still acceptable from a general view. The amazing part is that we can now use just 2 G-buffers instead of 3.
+
+| Original Layout | Optimized Layout |
+| ------------------------ | ------------------------ |
+| buffer1 [pos.x, pos.y, pos.z, 1] | buffer1 [pos.x, pos.y, pos.z, code.x] |
+| buffer2 [normal.x, normal.y, normal.z, 0] | buffer2 [code.y, albedo.x, albedo.y, albedo.z] |
+| buffer3 [albedo.x, albedo.y, albedo.z, 1] | buffer3 [] |
+
+![](img/opt.png)
+
+Surprisingly, with optimized g-buffers, Clustered Deferred beats Forward+ even when the number of lights is small. This verifies the statement we made in previous part. Also, notice that the benefits of this optimization gets insignificant when the number of lights is very large. The reason may be that the computation load is considerably high, making the memory latency less important. At the same time, the encoding and decoding takes additional cost.
 
 ### Credits
 
@@ -31,3 +131,7 @@ to implementation work. Complete the implementation early to leave time!
 * [webgl-debug](https://github.com/KhronosGroup/WebGLDeveloperTools) by Khronos Group Inc.
 * [glMatrix](https://github.com/toji/gl-matrix) by [@toji](https://github.com/toji) and contributors
 * [minimal-gltf-loader](https://github.com/shrekshao/minimal-gltf-loader) by [@shrekshao](https://github.com/shrekshao)
+* A Primer On Efficient Rendering Algorithms & Clustered Shading, http://www.aortiz.me/2018/12/21/CG.html#comparing-algorithms
+* Idea of iterating on lights from Janine Liu's work, https://github.com/j9liu/Project5-WebGL-Forward-Plus-and-Clustered-Deferred
+* Concepts and figures from CIS460, https://www.cis.upenn.edu/~cis460/21fa/index.html
+* A Survey of Efficient Representations for Independent Unit Vectors, https://jcgt.org/published/0003/02/01/paper.pdf
diff --git a/img/col.png b/img/col.png
diff --git a/img/divide.png b/img/divide.png
diff --git a/img/lamb.png b/img/lamb.png
diff --git a/img/method.png b/img/method.png
diff --git a/img/norm.png b/img/norm.png
diff --git a/img/norm2.png b/img/norm2.png
diff --git a/img/oct.png b/img/oct.png
diff --git a/img/opt.png b/img/opt.png
diff --git a/img/phong.png b/img/phong.png
diff --git a/img/phong2.png b/img/phong2.png
diff --git a/img/pos.png b/img/pos.png
diff --git a/img/represent.gif b/img/represent.gif
diff --git a/img/trans.png b/img/trans.png
diff --git a/src/init.js b/src/init.js
@@ -1,5 +1,5 @@
 // TODO: Change this to enable / disable debug mode
-export const DEBUG = true && process.env.NODE_ENV === 'development';
+export const DEBUG = false && process.env.NODE_ENV === 'development';
 
 import DAT from 'dat.gui';
 import WebGLDebug from 'webgl-debug';

diff --git a/src/main.js b/src/main.js
@@ -10,7 +10,7 @@ const FORWARD_PLUS = 'Forward+';
 const CLUSTERED = 'Clustered Deferred';
 
 const params = {
-  renderer: FORWARD_PLUS,
+  renderer: CLUSTERED,
   _renderer: null,
 };
 
@@ -59,9 +59,9 @@ function render() {
   // If you would like the wireframe to render behind and in front
   // of objects based on relative depths in the scene, comment out /
   //the gl.disable(gl.DEPTH_TEST) and gl.enable(gl.DEPTH_TEST) lines.
-  gl.disable(gl.DEPTH_TEST);
-  wireframe.render(camera);
-  gl.enable(gl.DEPTH_TEST);
+  // gl.disable(gl.DEPTH_TEST);
+  // wireframe.render(camera);
+  // gl.enable(gl.DEPTH_TEST);
 }
 
 makeRenderLoop(render)();
diff --git a/src/renderers/base.js b/src/renderers/base.js
@@ -1,3 +1,5 @@
+import { vec4 } from 'gl-matrix';
+import { Vector4 } from 'three';
 import TextureBuffer from './textureBuffer';
 
 export const MAX_LIGHTS_PER_CLUSTER = 100;
@@ -25,6 +27,61 @@ export default class BaseRenderer {
       }
     }
 
+    // Traverse each light
+    for (let i = 0; i < scene.lights.length; i++) {
+      let light = scene.lights[i];
+
+      // Bounding box of point light
+      let bounding_radius = light.radius * 1.5;
+      let min_point = vec4.fromValues(light.position[0] - bounding_radius, light.position[1] - bounding_radius, light.position[2] - bounding_radius, 1);
+      let max_point = vec4.fromValues(light.position[0] + bounding_radius, light.position[1] + bounding_radius, light.position[2] + bounding_radius, 1);
+
+      // World space to screen space
+      let min_point_view = vec4.create();
+      let max_point_view = vec4.create();
+      vec4.transformMat4(min_point_view, min_point, viewMatrix);
+      vec4.transformMat4(max_point_view, max_point, viewMatrix);
+
+      let min_point_screen = vec4.create();
+      let max_point_screen = vec4.create();
+      vec4.transformMat4(min_point_screen, min_point_view, camera.projectionMatrix.elements);
+      vec4.transformMat4(max_point_screen, max_point_view, camera.projectionMatrix.elements);
+
+      for (let j = 0; j < 4; j++) {
+        min_point_screen[j] = min_point_screen[j] / min_point_screen[3];
+        max_point_screen[j] = max_point_screen[j] / max_point_screen[3];
+      }
+
+      // Corresponding coords of cluster
+      let min_x = Math.floor(this._xSlices * (min_point_screen[0] + 1) / 2);
+      let max_x = Math.ceil(this._xSlices * (max_point_screen[0] + 1) / 2);
+      let min_y = Math.floor(this._ySlices * (min_point_screen[1] + 1) / 2);
+      let max_y = Math.ceil(this._ySlices * (max_point_screen[1] + 1) / 2);
+      let min_z = Math.floor(this._zSlices * (min_point_screen[2] + 1) / 2);
+      let max_z = Math.ceil(this._zSlices * (max_point_screen[2] + 1) / 2);
+
+      // Exp. view space z coord of cluster
+      // let min_z = Math.floor(Math.log(Math.max(-min_point_view[2], 0.0001) / camera.near) * this._zSlices / Math.log(camera.far / camera.near));
+      // let max_z = Math.floor(Math.log(Math.max(-max_point_view[2], 0.0001) / camera.near) * this._zSlices / Math.log(camera.far / camera.near));
+
+      // Traverse all influenced clusters
+      for (let z = Math.max(0, min_z); z < Math.min(max_z + 1, this._zSlices); z++) {
+        for (let y = Math.max(0, min_y); y < Math.min(max_y + 1, this._ySlices); y++) {
+          for (let x = Math.max(0, min_x); x < Math.min(max_x + 1, this._xSlices); x++) {
+            let k = x + y * this._xSlices + z * this._xSlices * this._ySlices;
+            if (this._clusterTexture.buffer[this._clusterTexture.bufferIndex(k, 0)] < MAX_LIGHTS_PER_CLUSTER) {
+              // Add number of lights
+              this._clusterTexture.buffer[this._clusterTexture.bufferIndex(k, 0)]++;
+
+              // Record light id
+              let num_light = this._clusterTexture.buffer[this._clusterTexture.bufferIndex(k, 0)];
+              this._clusterTexture.buffer[this._clusterTexture.bufferIndex(k, Math.floor(num_light / 4)) + num_light % 4] = i;
+            }
+          }
+        }
+      }
+    }
+
     this._clusterTexture.update();
   }
 }
diff --git a/src/renderers/clusteredDeferred.js b/src/renderers/clusteredDeferred.js
@@ -2,14 +2,15 @@ import { gl, WEBGL_draw_buffers, canvas } from '../init';
 import { mat4, vec4 } from 'gl-matrix';
 import { loadShaderProgram, renderFullscreenQuad } from '../utils';
 import { NUM_LIGHTS } from '../scene';
+import { MAX_LIGHTS_PER_CLUSTER } from './base';
 import toTextureVert from '../shaders/deferredToTexture.vert.glsl';
 import toTextureFrag from '../shaders/deferredToTexture.frag.glsl';
 import QuadVertSource from '../shaders/quad.vert.glsl';
 import fsSource from '../shaders/deferred.frag.glsl.js';
 import TextureBuffer from './textureBuffer';
 import BaseRenderer from './base';
 
-export const NUM_GBUFFERS = 4;
+export const NUM_GBUFFERS = 2;
 
 export default class ClusteredDeferredRenderer extends BaseRenderer {
   constructor(xSlices, ySlices, zSlices) {
@@ -29,7 +30,8 @@ export default class ClusteredDeferredRenderer extends BaseRenderer {
       numLights: NUM_LIGHTS,
       numGBuffers: NUM_GBUFFERS,
     }), {
-      uniforms: ['u_gbuffers[0]', 'u_gbuffers[1]', 'u_gbuffers[2]', 'u_gbuffers[3]'],
+      uniforms: ['u_gbuffers[0]', 'u_gbuffers[1]', 'u_gbuffers[2]', 'u_gbuffers[3]', 
+                'u_lightbuffer', 'u_clusterbuffer', 'u_slices_x', 'u_slices_y', 'u_slices_z', 'u_view_proj_mat', 'u_max_light', 'u_cam_pos'],
       attribs: ['a_uv'],
     });
 
@@ -154,9 +156,28 @@ export default class ClusteredDeferredRenderer extends BaseRenderer {
     gl.useProgram(this._progShade.glShaderProgram);
 
     // TODO: Bind any other shader inputs
+    // Set the light texture as a uniform input to the shader
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, this._lightTexture.glTexture);
+    gl.uniform1i(this._progShade.u_lightbuffer, 0);
+
+    // Set the cluster texture as a uniform input to the shader
+    gl.activeTexture(gl.TEXTURE1);
+    gl.bindTexture(gl.TEXTURE_2D, this._clusterTexture.glTexture);
+    gl.uniform1i(this._progShade.u_clusterbuffer, 1);
+
+    gl.uniform1i(this._progShade.u_slices_x, this._xSlices);
+    gl.uniform1i(this._progShade.u_slices_y, this._ySlices);
+    gl.uniform1i(this._progShade.u_slices_z, this._zSlices);
+
+    gl.uniformMatrix4fv(this._progShade.u_view_proj_mat, false, this._viewProjectionMatrix);
+
+    gl.uniform1i(this._progShade.u_max_light, MAX_LIGHTS_PER_CLUSTER);
+
+    gl.uniform3f(this._progShade.u_cam_pos, camera.position.x, camera.position.y, camera.position.z);
 
     // Bind g-buffers
-    const firstGBufferBinding = 0; // You may have to change this if you use other texture slots
+    const firstGBufferBinding = 2; // You may have to change this if you use other texture slots
     for (let i = 0; i < NUM_GBUFFERS; i++) {
       gl.activeTexture(gl[`TEXTURE${i + firstGBufferBinding}`]);
       gl.bindTexture(gl.TEXTURE_2D, this._gbuffers[i]);

diff --git a/src/renderers/forwardPlus.js b/src/renderers/forwardPlus.js
@@ -2,6 +2,7 @@ import { gl } from '../init';
 import { mat4, vec4, vec3 } from 'gl-matrix';
 import { loadShaderProgram } from '../utils';
 import { NUM_LIGHTS } from '../scene';
+import { MAX_LIGHTS_PER_CLUSTER } from './base';
 import vsSource from '../shaders/forwardPlus.vert.glsl';
 import fsSource from '../shaders/forwardPlus.frag.glsl.js';
 import TextureBuffer from './textureBuffer';
@@ -17,7 +18,8 @@ export default class ForwardPlusRenderer extends BaseRenderer {
     this._shaderProgram = loadShaderProgram(vsSource, fsSource({
       numLights: NUM_LIGHTS,
     }), {
-      uniforms: ['u_viewProjectionMatrix', 'u_colmap', 'u_normap', 'u_lightbuffer', 'u_clusterbuffer'],
+      uniforms: ['u_viewProjectionMatrix', 'u_colmap', 'u_normap', 'u_lightbuffer', 'u_clusterbuffer', 
+                'u_slices_x', 'u_slices_y', 'u_slices_z', 'u_cam_near', 'u_cam_far', 'u_view_mat', 'u_proj_mat', 'u_max_light'],
       attribs: ['a_position', 'a_normal', 'a_uv'],
     });
 
@@ -76,6 +78,17 @@ export default class ForwardPlusRenderer extends BaseRenderer {
     gl.uniform1i(this._shaderProgram.u_clusterbuffer, 3);
 
     // TODO: Bind any other shader inputs
+    gl.uniform1i(this._shaderProgram.u_slices_x, this._xSlices);
+    gl.uniform1i(this._shaderProgram.u_slices_y, this._ySlices);
+    gl.uniform1i(this._shaderProgram.u_slices_z, this._zSlices);
+
+    gl.uniform1f(this._shaderProgram.u_cam_near, camera.near);
+    gl.uniform1f(this._shaderProgram.u_cam_far, camera.far);
+
+    gl.uniformMatrix4fv(this._shaderProgram.u_view_mat, false, this._viewMatrix);
+    gl.uniformMatrix4fv(this._shaderProgram.u_proj_mat, false, this._projectionMatrix);
+
+    gl.uniform1i(this._shaderProgram.u_max_light, MAX_LIGHTS_PER_CLUSTER);
 
     // Draw the scene. This function takes the shader program so that the model's textures can be bound to the right inputs
     scene.draw(this._shaderProgram);