Implementing an argmax in drjit #1562

hsunekichi · 2025-04-10T16:45:51Z

hsunekichi
Apr 10, 2025

Hi, I am trying to implement a BSDF that stores several externally evaluated samples and returns the closest one to the input directions.

This is the eval code in numpy, which searches the most similar sample and gathers the associated color.

def eval(self, ctx, si, wo, active):
      wi = np.array(wi)
      wo = np.array(wo)

      cosi = np.abs(np.dot(wi, self.cartesian_wis))
      coso = np.abs(np.dot(wo, self.cartesian_wos))
      crossi = np.abs(np.dot(wi, self.cartesian_wos))
      crosso = np.abs(np.dot(wo, self.cartesian_wis))

      # Check also cross similarity because of Hellmann's reciprocity
      similarities = np.maximum(cosi + coso, crossi + crosso) 
      idx = int(np.argmax(similarities))
        
      return mi.Color3f(self.data[:, idx]) * mi.Frame3f.cos_theta(wo)

However this only works in scalar mode and thus is extremely slow. I am trying to implement it with drjit, but I don't know how to implement the final argmax reduction. This is my drjit code:

def eval(self, ctx, si, wo, active):
        active &= ...... # Check the input is correct

        cosi   = dr.abs_dot(wi, self.cartesian_wis)
        coso   = dr.abs_dot(wo, self.cartesian_wos)
        crossi = dr.abs_dot(wi, self.cartesian_wos)
        crosso = dr.abs_dot(wo, self.cartesian_wis)        

        similarities = dr.maximum(cosi + coso, crossi + crosso)

        # Perform the argmax operation
        max_val = dr.max(similarities, axis=0)
        max_mask = similarities == max_val
        filtered = self.data_indices & max_mask
        idx = dr.sum(filtered, axis=0)
        
        result = dr.gather(Array3f, source=self.data_flat, index=idx)
        return mi.Color3f(result) * mi.Frame3f.cos_theta(wo)

There is no built-in argmax, so I have tried to implement it by combining several reductions. But since the variables are symbolic, they throw errors due to not being able to evaluate them.

I would be grateful if someone could help me, thanks in advance!

Answered by njroussel

Apr 11, 2025

Hi @hsunekichi

You might have seen this discussion in the Dr.Jit repository: mitsuba-renderer/drjit#375
You'll also want to have a look at this, slightly more general, page: https://drjit.readthedocs.io/en/stable/eval.html

Fundamentally, you'll never be able to have a symbolic horizontal reduction with a read of the result in the same kernel.
Note that we do have a mode=symbolic on most of our horizontal reductions, but they produce a side-effect which means that when you'll try to access the result it will trigger an evaluation.

The reason lies in the execution model: not all threads are alive at the same time. At any point in time, assuming you're running a very wide kernel, only a subs…

View full answer

njroussel · 2025-04-11T11:38:58Z

njroussel
Apr 11, 2025
Collaborator

Hi @hsunekichi

You might have seen this discussion in the Dr.Jit repository: mitsuba-renderer/drjit#375
You'll also want to have a look at this, slightly more general, page: https://drjit.readthedocs.io/en/stable/eval.html

Fundamentally, you'll never be able to have a symbolic horizontal reduction with a read of the result in the same kernel.
Note that we do have a mode=symbolic on most of our horizontal reductions, but they produce a side-effect which means that when you'll try to access the result it will trigger an evaluation.

The reason lies in the execution model: not all threads are alive at the same time. At any point in time, assuming you're running a very wide kernel, only a subset of threads are running and once they're done we can move on to the next subset, until all threads are executed. Inherently, a reduction requires some level of synchronization - all threads must come together to produce a result before being able to continue their work - this is not compatible with the execution model.

4 replies

wjakob Apr 11, 2025
Maintainer

Let me throw in one thing just in case it's useful: one can implement argmax atomically by packing the value (e.g. a float, which are ordered at a binary level) and the index into a 64 bit number. You could then do a regular dr.max(..., axis=..) and extract both index and value from the packed 64-bit representation. Take a look at dr.reinterpret_array for the Float<->Int conversion. Of course this won't help you if you need access to the result of the argmax within a symbolic region, that is not allowed as Nicolas pointed out.

hsunekichi Apr 15, 2025
Author

Thanks, then I will stick to third-party reductions.

Is there a way to extract the value on C++, something like wi = wi.numpy()?
That way at least won't have the penalty of python bindings.

wjakob Apr 15, 2025
Maintainer

I'm not sure how third party reductions will help. Will you then not have that same issue of not being able to put them into a symbolic region?

hsunekichi Apr 15, 2025
Author

Evaluating the samples and just working with numpy works good enough. That however forces to use scalar mode, which as far as I understand is extremely slow due to the python bindings. That's why I am then trying to implement it in C++.

I suppose that drjit could also be used to implement it in evaluated mode, but that would still be forced to work in scalar mode and thus have the performance penalty.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing an argmax in drjit #1562

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Implementing an argmax in drjit #1562

hsunekichi Apr 10, 2025

Replies: 1 comment · 4 replies

njroussel Apr 11, 2025 Collaborator

wjakob Apr 11, 2025 Maintainer

hsunekichi Apr 15, 2025 Author

wjakob Apr 15, 2025 Maintainer

hsunekichi Apr 15, 2025 Author

hsunekichi
Apr 10, 2025

Replies: 1 comment 4 replies

njroussel
Apr 11, 2025
Collaborator

wjakob Apr 11, 2025
Maintainer

hsunekichi Apr 15, 2025
Author

wjakob Apr 15, 2025
Maintainer

hsunekichi Apr 15, 2025
Author