Add checkbounds for gather and support empty source array #51

yuehhua · 2022-06-03T14:54:45Z

Closes FluxML/NNlib.jl#416, FluxML/NNlib.jl#411

yuehhua · 2022-06-03T15:16:33Z

@mcabbott would you please review this?

fix

src/gather.jl

Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]>

chengchingwen

I don't think I have the permission, so you would need someone else to review

yuehhua · 2022-06-06T15:28:09Z

@CarloLucibello would you please review this?

CarloLucibello · 2022-06-07T04:12:43Z

We can make the boundschecking much faster. In the simplest case with integer indexes:

a = axes(src, ndims(src))
checkindex(a, collect(extrema(idx)))

yuehhua · 2022-06-07T04:20:56Z

I just try it. It seems that extrema cause the scalar indexing issue for CUDA. But it works separately with maximum and minimum.

chengchingwen · 2022-06-07T04:33:16Z

I think there won't be much difference in speed since they are all calling the mapreduce kernel

CarloLucibello · 2022-06-07T05:04:51Z

We can check just the maximum (we don't really support things like offset arrays in any case).

yuehhua · 2022-06-07T09:47:52Z

I think @chengchingwen is right. They all pass through the same GPU kernel such that computation over an array costs the same time as computing a single value. Since maximum and minimum are needed for extrema, it calls GPU kernel twice for maximum and minimum, separately.

using CUDA
using BenchmarkTools

T = Float32
CT = CuArray{Float32}

src = CT([3, 4, 5, 6, 7])
idx = cu([1 2 3 4;
            4 2 1 3;
            3 5 5 3])

function checkbounds_src(src, dims::Union{Int, Val}, ::Type{<:Any})
    return i -> checkbounds(Bool, src, ntuple(x -> Colon(), dims)..., i...)
end

function checkbounds_src(src, dims::Union{Int, Val}, ::Type{<:CartesianIndex})
    return i -> checkbounds(Bool, src, ntuple(x -> Colon(), dims)..., i)
end

function checkbounds1(src, idx, dims)
    return map(checkbounds_src(src, Val(dims), eltype(idx)), idx)
end

function checkbounds2(src, idx, dims)
    a = axes(src, ndims(src))
    return checkindex(Bool, a, minimum(idx):maximum(idx))
end

checkbounds1(src, idx, 1)
checkbounds2(src, idx, 1)

julia> @benchmark CUDA.@sync checkbounds1($src, $idx, 1)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  10.406 μs … 73.799 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.080 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.074 μs ±  3.770 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▆▅▆▅▄▃▂▁                                                  ▂
  █████████████▇▇▇▇▆▇▆▅▄▄▄▄▄▃▄▄▂▃▅▄▄▂▄▄▆▅▄▅▅▄▅▅▄▄▄▅▂▄▄▅▅▄▅▄▃▃ █
  10.4 μs      Histogram: log(frequency) by time      32.7 μs <

 Memory estimate: 3.12 KiB, allocs estimate: 56.

julia> @benchmark CUDA.@sync checkbounds2($src, $idx, 1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  31.644 μs …  39.208 ms  ┊ GC (min … max): 0.00% … 44.45%
 Time  (median):     33.708 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   40.350 μs ± 391.981 μs  ┊ GC (mean ± σ):  4.32% ±  0.44%

  ▅██▇▆▄▂▁          ▁▁▁▁                                       ▂
  █████████████▇█▇▇▇█████▇▇▇▇▇▇▆▅▆▅▆▅▇▆▆▆▆▇▆▆▆▆▅▅▅▄▄▅▅▄▄▅▄▅▅▅▄ █
  31.6 μs       Histogram: log(frequency) by time      79.8 μs <

 Memory estimate: 4.88 KiB, allocs estimate: 88.

The original approach (checkbounds1) performs better speed than proposed approach (checkbounds2).

YichengDWu · 2022-06-14T18:25:01Z

Any updates?

yuehhua · 2022-06-14T23:22:48Z

Wait for review.

CarloLucibello · 2022-06-15T13:58:54Z

@yuehhua can you benchmark this PR vs master for a few input sizes? Just to make sure that boundschecking doesn't take more than the real computation

yuehhua · 2022-06-15T14:52:26Z

Benchmark code:

using NNlib, NNlibCUDA
using CUDA
using BenchmarkTools

T = Float32
CT = CuArray{Float32}

src = CT([3, 4, 5, 6, 7]);
idx = cu([1 2 3 4;
                   4 2 1 3;
                   3 5 5 3]);

NNlib.gather(src, idx);

This PR:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  28.514 μs …  29.643 ms  ┊ GC (min … max): 0.00% … 35.08%
 Time  (median):     33.589 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.676 μs ± 296.188 μs  ┊ GC (mean ± σ):  2.76% ±  0.35%

         ▂▇█▇▅▃▁                                                
  ▂▄▆▆▆▇▇███████▇▇▆▆▆▅▄▄▄▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  28.5 μs         Histogram: frequency by time         56.4 μs <

 Memory estimate: 6.05 KiB, allocs estimate: 109.

master branch:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.472 μs … 55.821 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.803 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.026 μs ±  1.099 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █                                                          
  ▅██▆██▄▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  6.47 μs        Histogram: frequency by time        11.5 μs <

 Memory estimate: 544 bytes, allocs estimate: 12.

It seems to take around 5 times slower than master branch.

CarloLucibello · 2022-06-15T14:57:30Z

Mhmh. That is very small size though, can you check with 10x or 100x bigger arrays?

yuehhua · 2022-06-15T15:40:23Z

PR:

julia> src = CUDA.rand(50);

julia> idx = rand(1:50, 30, 4) |> cu;

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  30.024 μs …  34.845 ms  ┊ GC (min … max): 0.00% … 37.69%
 Time  (median):     34.888 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   40.244 μs ± 348.197 μs  ┊ GC (mean ± σ):  3.26% ±  0.38%

    ▃▆█▆▇▅▃▁                                                    
  ▁▆████████▇▆▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  30 μs           Histogram: frequency by time         72.4 μs <

 Memory estimate: 6.05 KiB, allocs estimate: 109.

master branch:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.811 μs … 104.485 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.106 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.516 μs ±   1.748 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▅▆▆▄▄▃▂▁                                                  ▂
  ████████████▇██▇▇▇█████▇▇▇▅▅▅▅▆▅▄▅▄▄▄▄▄▅▅▄▄▅▅▆▅▅▆▅▅▄▆▄▄▄▄▄▂ █
  6.81 μs      Histogram: log(frequency) by time      14.7 μs <

 Memory estimate: 544 bytes, allocs estimate: 12.

yuehhua · 2022-06-15T15:44:06Z

It seems to take too much GC time. Maybe the closure causes this.

YichengDWu · 2022-06-15T21:08:14Z

is map using CUDA kernels?

YichengDWu · 2022-06-15T22:58:25Z

src/gather.jl


+    # check bounds
+    in_bnd = map(checkbounds_src(src, Val(dims), eltype(idx)), idx)
+    if !all(in_bnd)


This line slows down the code to a great portion

YichengDWu · 2022-06-15T23:55:51Z

I have done some tests, the closure here is fine. It won't help to use Base.Fix2

yuehhua · 2022-06-16T02:54:55Z

The closure is resolved and benchmarked below:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  22.388 μs … 797.144 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     30.675 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   30.996 μs ±  10.354 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▂▅█▄                                            
  ▁▂▂▁▁▁▁▆▇██▇██████▇▄▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  22.4 μs         Histogram: frequency by time         55.6 μs <

 Memory estimate: 3.08 KiB, allocs estimate: 53.

@MilkshakeForReal map does use CUDA kernel. However, the map CUDA kernel itself cost large portion of time. I turned of the map CUDA kernel and got:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.588 μs … 95.860 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.985 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.414 μs ±  1.796 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▇▇▅▅▄▃▂▂▂▂▂▁▁                                            ▂
  ██████████████████▇▇▇▇▆▆▇▆▇▆▇▇▇▇▇▇▆▆▆▆▆▇▆▆▄▅▆▆▅▅▅▆▅▃▅▃▅▄▆▅ █
  6.59 μs      Histogram: log(frequency) by time     14.6 μs <

 Memory estimate: 544 bytes, allocs estimate: 12.

It is still 4 times slower. So, we could come out with a more efficient CUDA kernel or find other efficient way to check bounds, or even no bound checks.

YichengDWu · 2022-06-16T03:12:25Z

For the last version, commenting out if !all(in_bnd) reduces the time from 68 μs to 25 μs on my machine. And it's 15 μs without bounds checking. So I concluded that map was fine but reduce was slower.

yuehhua · 2022-06-16T03:12:32Z

Drop @inbounds in CUDA kernel to use CUDA's bound check.

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.654 μs … 111.630 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.214 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.549 μs ±   1.843 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▇█▅▆▅▄▂▂▂▂▂▂▁▂▁▁                                         ▂
  ███████████████████████▇▇▇▆▆▆▆▇▆▅▅▆▅▆▄▅▅▅▄▅▄▅▆▄▄▃▅▅▅▅▃▅▅▅▄▄ █
  6.65 μs      Histogram: log(frequency) by time      14.5 μs <

 Memory estimate: 528 bytes, allocs estimate: 12.

yuehhua · 2022-06-16T03:15:37Z

@MilkshakeForReal For the last version, all(in_bnd) is merged into mapreduce and it reduces from 37.676 μs to 30.996 μs. Just small improvement.

YichengDWu · 2022-06-16T03:31:57Z

MRE:

function NNlib.gather!(dst::AnyCuArray, src::AnyCuArray, idx::AnyCuArray)
    # check dims
    dims = gather_check_dims(src, dst, idx)
    dims_size = size(src)[1:dims]
    max_dims_idx = prod(dims_size)
    max_idx = max_dims_idx * length(idx)

    # check bounds
    idx_bounds = size(src, ndims(src))#[dims+1:end]
    in_bnd = map(i -> i <= idx_bounds, idx)

    isempty(src) && return dst

    # cuda kernel
    args = dst, src, idx, max_idx, max_dims_idx, dims_size
    kernel = @cuda launch=false gather_kernel!(args...)
    config = launch_configuration(kernel.fun; max_threads=256)
    threads = min(max_idx, config.threads)
    blocks = cld(max_idx, threads)
    kernel(args...; threads=threads, blocks=blocks)
    return dst
end
julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  18.900 μs …  2.720 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     21.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.564 μs ± 71.784 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆██▇▆▅▄▃▃▃▂▂▁▁▁▁▁▁▂▁▁▁                                     ▂
  ███████████████████████████▇▇██▇▆▇▇▇▇▆▆▇▇▆▆▆▆▅▅▆▄▄▄▄▅▃▅▂▃▄▃ █
  18.9 μs      Histogram: log(frequency) by time      60.7 μs <

 Memory estimate: 1.33 KiB, allocs estimate: 31.

With !all(in_bnd):

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  55.200 μs …   2.510 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     69.300 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   79.856 μs ± 102.221 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▂█▇▃▂▁
  ▁▃██████▆▅▆▄▅▇▇▆▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  55.2 μs         Histogram: frequency by time          174 μs <

 Memory estimate: 3.72 KiB, allocs estimate: 72.

Without any bounds checking:

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.600 μs …  2.407 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     15.700 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.899 μs ± 48.924 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▅▆██▆▄▃▄▃▂▁ ▁▁     ▁▂▃▃▃▄▃▂▂▂▂▂▂▂                          ▂
  █████████████████▇█▇██████████████▇█▇▆▆▅▆▅▇▇▅▆▆▆▅▅▅▅▆██▆▆▇▅ █
  12.6 μs      Histogram: log(frequency) by time      51.5 μs <

 Memory estimate: 544 bytes, allocs estimate: 12.

YichengDWu · 2022-06-16T04:24:09Z

@MilkshakeForReal For the last version, all(in_bnd) is merged into mapreduce and it reduces from 37.676 μs to 30.996 μs. Just small improvement.

I believe you can achieve almost identical speedup by writing

in_bnd = mapreduce(checkbounds_src(src, Val(dims), eltype(idx)), &, idx)

in the last version. The speedup does not majorly come from resolving closure.

YichengDWu · 2022-06-16T04:54:12Z

This is my implementation based on yours. It seems working fine. I still don't know why we need to dispatch dims.

function _checkbounds_indices(i::Tuple, idx_bounds::Tuple)
    return Base.checkbounds_indices(Bool, idx_bounds, i) 
end

function _checkbounds_indices(i::CartesianIndex, idx_bounds::Tuple)
    return Base.checkbounds_indices(Bool, idx_bounds, Tuple(i))
end

function _checkbounds_indices(i::Int, idx_bounds::Tuple)
    return Base.checkbounds_indices(Bool, idx_bounds, (i,))
end

function NNlib.gather!(dst::AnyCuArray, src::AnyCuArray, idx::AnyCuArray)
    # check dims
    dims = gather_check_dims(src, dst, idx)
    dims_size = size(src)[1:dims]
    max_dims_idx = prod(dims_size)
    max_idx = max_dims_idx * length(idx)

    # check bounds
    idx_bounds = axes(src)[dims+1:end]
    in_bnd = mapreduce(Base.Fix2(_checkbounds_indices,idx_bounds), &, idx)

    if !in_bnd
        #whatever is here, we don't need to care about the speed when something is wrong.
    end

    isempty(src) && return dst

    # cuda kernel
    args = dst, src, idx, max_idx, max_dims_idx, dims_size
    kernel = @cuda launch=false gather_kernel!(args...)
    config = launch_configuration(kernel.fun; max_threads=256)
    threads = min(max_idx, config.threads)
    blocks = cld(max_idx, threads)
    kernel(args...; threads=threads, blocks=blocks)
    return dst
end

YichengDWu · 2022-06-16T05:12:40Z

@MilkshakeForReal For the last version, all(in_bnd) is merged into mapreduce and it reduces from 37.676 μs to 30.996 μs. Just small improvement.

The size of the improvement by removing all(in_bnd) is the same magnitude as the one of the latest version compared to the last one

yuehhua · 2022-06-16T05:54:14Z

The size of the improvement by removing all(in_bnd) is the same magnitude as the one of the latest version compared to the last one

# check bounds
idx_bounds = size(src, ndims(src))#[dims+1:end]
in_bnd = map(i -> i <= idx_bounds, idx)

We still have to raise a bound check error to users, otherwise there is meaningless to do bound check. You could have other ways to replace all(in_bnd), but still have to summarize your in_bnd from map.

YichengDWu · 2022-06-16T06:03:02Z

It's just to demonstrate we don't really need to avoid closure in this case. I just tested the latest version and the performance was almost identical to mine, with closure or not. Both of them use mapreduce. Throwing the error msg could be done in a slower fashion since if something is wrong we just want it to stop and really don't care too much about the speed. We can even use map again after if !in_bnd.

chengchingwen · 2022-06-16T06:11:03Z

@MilkshakeForReal avoiding closure is for reducing the gc time and memory allocation.

yuehhua · 2022-06-16T06:13:11Z

Not much improvement for the latest proposal.

julia> @benchmark CUDA.@sync NNlib.gather($src, $idx)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  24.986 μs … 618.544 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     32.902 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   33.122 μs ±   7.792 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁    ▁▂▄██▃                                          
  ▂▃▂▂▁▁▃▇▇██▇▇███████▆▅▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  25 μs           Histogram: frequency by time         54.6 μs <

 Memory estimate: 3.81 KiB, allocs estimate: 73.

YichengDWu · 2022-06-16T06:16:13Z

Do you mean my proposal? There isn't any improvement. Just similar performance.

yuehhua · 2022-06-16T06:18:33Z

@MilkshakeForReal We still need reduce to summarize a CuArray{Bool} into a single boolean value. A ! cannot be applied on CuArray{Bool} and the if statement cannot work properly.

julia> in_bnd = rand(Bool, 5) |> cu;

julia> if !in_bnd
           println(123)
       end
ERROR: MethodError: no method matching !(::CuArray{Bool, 1, CUDA.Mem.DeviceBuffer})
Closest candidates are:
  !(::Function) at /opt/julia-1.7.3/share/julia/base/operators.jl:1117
  !(::Bool) at /opt/julia-1.7.3/share/julia/base/bool.jl:35
  !(::Missing) at /opt/julia-1.7.3/share/julia/base/missing.jl:101
Stacktrace:
 [1] top-level scope
   @ REPL[12]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/GGwVa/src/initialization.jl:52

YichengDWu · 2022-06-16T06:22:01Z

Where do we have this issue? In my code its already reduced. The closure is also avoided if we don't want it.

YichengDWu · 2022-06-16T06:23:03Z

memory allocation

I don't know much about the gc time, just speaking from my test results

YichengDWu · 2022-06-19T03:11:34Z

Could we isolate the support for empty arrays and leave bounds checking to further discussion?

add checkbounds for gather

f1bd6e2

yuehhua requested a review from mcabbott June 3, 2022 15:15

yuehhua mentioned this pull request Jun 3, 2022

Empty array input will throw error FluxML/NNlib.jl#417

Closed

yuehhua changed the title ~~Add checkbounds for gather~~ Add checkbounds for gather and support empty source array Jun 3, 2022

support empty source array for gather

e58131a

fix

yuehhua force-pushed the boundcheck branch from c81deb7 to e58131a Compare June 3, 2022 16:38

chengchingwen reviewed Jun 4, 2022

View reviewed changes

src/gather.jl Outdated Show resolved Hide resolved

chengchingwen reviewed Jun 4, 2022

View reviewed changes

src/gather.jl Outdated Show resolved Hide resolved

fix

1f0ad39

chengchingwen reviewed Jun 4, 2022

View reviewed changes

src/gather.jl Outdated Show resolved Hide resolved

yuehhua force-pushed the boundcheck branch from 98e5ecb to f927bf8 Compare June 4, 2022 06:26

yuehhua closed this Jun 4, 2022

yuehhua reopened this Jun 4, 2022

chengchingwen reviewed Jun 4, 2022

View reviewed changes

src/gather.jl Outdated Show resolved Hide resolved

src/gather.jl Outdated Show resolved Hide resolved

src/gather.jl Outdated Show resolved Hide resolved

Update src/gather.jl

0e345cb

Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]> Update src/gather.jl Co-authored-by: Peter <[email protected]>

yuehhua force-pushed the boundcheck branch from 5f28c17 to 0e345cb Compare June 4, 2022 10:22

yuehhua requested a review from chengchingwen June 4, 2022 13:45

chengchingwen approved these changes Jun 5, 2022

View reviewed changes

yuehhua requested a review from CarloLucibello June 6, 2022 15:27

YichengDWu reviewed Jun 15, 2022

View reviewed changes

resolve closure

d3e4bec

This was referenced Dec 31, 2022

Fast path onehotbatch(::Vector{Int}, ::UnitRange) FluxML/OneHotArrays.jl#27

Merged

accept integer labels in (logit)crossentropy FluxML/Flux.jl#2141

Open

Add checkbounds for gather and support empty source array #51

Are you sure you want to change the base?

Add checkbounds for gather and support empty source array #51

Conversation

yuehhua commented Jun 3, 2022 • edited Loading

yuehhua commented Jun 3, 2022

chengchingwen left a comment

Choose a reason for hiding this comment

yuehhua commented Jun 6, 2022

CarloLucibello commented Jun 7, 2022

yuehhua commented Jun 7, 2022 • edited Loading

chengchingwen commented Jun 7, 2022

CarloLucibello commented Jun 7, 2022

yuehhua commented Jun 7, 2022

YichengDWu commented Jun 14, 2022

yuehhua commented Jun 14, 2022

CarloLucibello commented Jun 15, 2022

yuehhua commented Jun 15, 2022

CarloLucibello commented Jun 15, 2022

yuehhua commented Jun 15, 2022

yuehhua commented Jun 15, 2022

YichengDWu commented Jun 15, 2022

YichengDWu Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

YichengDWu commented Jun 15, 2022

yuehhua commented Jun 16, 2022

YichengDWu commented Jun 16, 2022

yuehhua commented Jun 16, 2022

yuehhua commented Jun 16, 2022

YichengDWu commented Jun 16, 2022

YichengDWu commented Jun 16, 2022 • edited Loading

YichengDWu commented Jun 16, 2022 • edited Loading

YichengDWu commented Jun 16, 2022 • edited Loading

yuehhua commented Jun 16, 2022 • edited Loading

YichengDWu commented Jun 16, 2022

chengchingwen commented Jun 16, 2022

yuehhua commented Jun 16, 2022

YichengDWu commented Jun 16, 2022

yuehhua commented Jun 16, 2022

YichengDWu commented Jun 16, 2022 • edited Loading

YichengDWu commented Jun 16, 2022

YichengDWu commented Jun 19, 2022

yuehhua commented Jun 3, 2022 •

edited

Loading

yuehhua commented Jun 7, 2022 •

edited

Loading

YichengDWu Jun 15, 2022 •

edited

Loading

YichengDWu commented Jun 16, 2022 •

edited

Loading

YichengDWu commented Jun 16, 2022 •

edited

Loading

YichengDWu commented Jun 16, 2022 •

edited

Loading

yuehhua commented Jun 16, 2022 •

edited

Loading

YichengDWu commented Jun 16, 2022 •

edited

Loading