Error when precompiling in Windows #198

yolhan83 · 2024-10-27T11:43:00Z

Hello,

I realize it might be too early for Windows support, but I didn't see an existing issue on this.
In case it hasn't been tested yet, I just wanted to point out that I encountered the following error on Windows during precompilation. It works fine on my wsl though.

version :

Julia Version 1.11.1
Commit 8f5b7ca12a (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700H
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 20 default, 0 interactive, 10 GC (on 20 virtual cores)

error :

Precompiling Reactant...
Info Given Reactant was explicitly requested, output will be shown live
ERROR: LoadError: UndefVarError: `libReactantExtra` not defined in `Reactant_jll`
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base .\Base.jl:42
 [2] top-level scope
   @ C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\mlir\MLIR.jl:8
 [3] include(mod::Module, _path::String)
   @ Base .\Base.jl:557
 [4] include(x::String)
   @ Reactant C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:1
 [5] top-level scope
   @ C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:82
 [6] include
   @ .\Base.jl:557 [inlined]
 [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base .\loading.jl:2790
 [8] top-level scope
   @ stdin:5
in expression starting at C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\mlir\MLIR.jl:1
in expression starting at C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:1
in expression starting at stdin:5
  ✗ Reactant
  0 dependencies successfully precompiled in 4 seconds. 79 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

Reactant

Failed to precompile Reactant [3c362404-f566-11ee-1572-e11a4b42c853] to "C:\\Users\\yolha\\.julia\\compiled\\v1.11\\Reactant\\jl_E4CF.tmp".
ERROR: LoadError: UndefVarError: `libReactantExtra` not defined in `Reactant_jll`
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base .\Base.jl:42
 [2] top-level scope
   @ C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\mlir\MLIR.jl:8
 [3] include(mod::Module, _path::String)
   @ Base .\Base.jl:557
 [4] include(x::String)
   @ Reactant C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:1
 [5] top-level scope
   @ C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:82
 [6] include
   @ .\Base.jl:557 [inlined]
 [7] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base .\loading.jl:2790
 [8] top-level scope
   @ stdin:5
in expression starting at C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\mlir\MLIR.jl:1
in expression starting at C:\Users\yolha\.julia\packages\Reactant\rRa4g\src\Reactant.jl:1

The text was updated successfully, but these errors were encountered:

mofeing · 2024-10-27T11:49:32Z

ReactantExtra, the C-API that wraps XLA C++ API and includes Enzyme-JAX, doesn't support Windows yet. Actually, I fear that supporting Windows will be a headache...

But do you mind running it in Windows Subsystem for Linux? It might work there. I'm curious but I don't have a Windows machine.

yolhan83 · 2024-10-27T11:52:12Z

Yes I said it worked (precompilation at least) on my wsl subsystem Ubuntu. I will install my gpu on it and try things out no problem

mofeing · 2024-10-27T11:53:10Z

ups, didn't read that part. Nice to know.

yolhan83 · 2024-10-27T12:38:29Z

Just tested in CPU, working fine on 1.10 on wsl subsystem
version :

Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700H
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)

env :

  [7da242da] Enzyme v0.13.12
  [b2108857] Lux v1.2.0
  [3c362404] Reactant v0.2.3

code :

julia> using Lux,Reactant,Enzyme,Random;

julia> const dev = xla_device()
(::XLADevice) (generic function with 3 methods)

julia> model = Lux.Chain(Dense(3,10,relu),Dense(10,10,relu),Dense(10,1));

julia> ps,st = Lux.setup(Random.default_rng(1),model) |> dev;

julia> x = rand(Float32,3,1000) |> dev;

julia> y = sum(x,dims=1) |> dev;

julia> stlayer = Lux.StatefulLuxLayer{true}(model,nothing,st);

julia> loss(stlayer,ps,x,y) = sum(abs2,stlayer(x,ps).-y)/length(y);

julia> loss(stlayer,ps,x,y)
0.6765756f0

julia> dps = deepcopy(ps);

julia> grad = Enzyme.autodiff(Reverse,loss,Active,Const(stlayer),Duplicated(ps,dps),Const(x),Const(y));

julia> dps[1][1]
10×3 ConcreteRArray{Float32, 2}:
  0.905959   1.53576   -1.35888
  1.73563    1.80716    1.42986
 -0.273622  -0.401162  -1.86655
 -0.073245  -1.31546    0.438985
 -1.04398    1.96327    0.349859
 -0.225534  -1.01021    0.564287
 -1.0768    -1.55186   -1.36067
 -0.203418  -0.791012   1.53536
  0.935806   1.02484    0.214392
 -1.35451   -1.598      0.427091

still issues in 1.11 tough, but that doesn't come from reactant.

for gpu, it looks insanely great actually :
env :

  [6e4b80f9] BenchmarkTools v1.5.0
  [052768ef] CUDA v5.5.2
  [7da242da] Enzyme v0.13.12
  [b2108857] Lux v1.2.0
  [3c362404] Reactant v0.2.3

code :

julia> using CUDA,Reactant,BenchmarkTools

julia> x= rand(100000000);

julia> xc = cu(x);

julia> Reactant.set_default_backend("gpu");

julia> const dev = xla_device();

julia> xc2 = dev(x);

julia> f(x) = sum(x);

julia> f_comp = @compile f(xc2);

julia> @btime f($x);
  41.367 ms (0 allocations: 0 bytes)

julia> @btime begin
           f($xc)
           CUDA.synchronize()
           end
  1.789 ms (96 allocations: 2.89 KiB)

julia> @btime begin
           Reactant.synchronize($f_comp($xc2))
           end
  29.175 μs (2 allocations: 48 bytes)

cuda version :

CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 552.12.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+550.73.1

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.10.5
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 4060 Laptop GPU (sm_89, 872.383 MiB / 7.996 GiB available)

can't make Enzyme work on gpu with Reactant but that may be skill issues, common error I see is

grad_comp = Enzyme.autodiff(Reverse,loss_comp,Const(stlayer),Duplicated(ps,dps),Const(x),Const(y));
ERROR:
No augmented forward pass found for XLAExecute
 at context:   call void @XLAExecute(i64 %102, i32 noundef 8, [8 x i64]* nocapture noundef nonnull readonly %inpa.i.i, [8 x i8]* nocapture noundef nonnull readonly %dona.i.i, i32 noundef 1, [1 x i64]* nocapture noundef nonnull writeonly %outa.i.i, i8* nocapture noundef nonnull writeonly %futa.i.i, [1 x i64]* nocapture noundef nonnull writeonly %futpa.i.i) #14, !dbg !58

where the only diff with cpu code was :

loss_comp = @compile loss(stlayer,ps,x,y)

update : just saw the issue gradient should compiled and so a little change on the code leads to everything working fine (at least on wsl Ubuntu)

julia> using Lux,Reactant,Enzyme,Random

julia> Reactant.set_default_backend("gpu");

julia> model = Lux.Chain(Dense(3,10,relu),Dense(10,10,relu),Dense(10,1));

julia> const dev = xla_device();

julia> x = rand(Float32,3,1000) |> dev;

julia> f(x) = sum(x,dims=1);

julia> f_comp = @compile f(x);

julia> y = f_comp(x)
1×1000 ConcreteRArray{Float32, 2}:
 1.91339  0.977359  1.40866  1.14727  1.35244  0.854568  …  2.66806  1.4641  1.9488  2.03938  1.75204  1.36932

julia> ps,st = Lux.setup(Random.default_rng(1),model) |> dev
((layer_1 = (weight = Float32[-1.1879814 0.5789728 0.78146553; -1.0979857 -0.7196951 -0.960644; … ; 0.38421702 -0.82885265 -1.7875335; -1.7717319 1.8281577 -0.5482931], bias = Float32[-0.55165297, -0.025069488, 0.12070516, -0.33363134, -0.27165163, -0.5426975, -0.02777502, -0.5143611, 0.06754502, 0.41636044]), layer_2 = (weight = Float32[-0.4120962 0.88357955 … 0.8667873 -0.60699934; 0.39998135 0.99203914 … -0.859783 0.14729756; … ; 0.57311285 1.0405946 … -0.9094574 0.4193144; -0.64737666 0.13689981 … -0.57182777 -0.76190686], bias = Float32[-0.19788045, -0.26863387, -0.03341088, -0.19565816, -0.19184135, -0.042680115, 0.18373081, 0.30809134, -0.2994386, 0.15902404]), layer_3 = (weight = Float32[0.033292364 0.3406409 … 0.20983447 -0.5079821], bias = Float32[-0.06395315])), (layer_1 = NamedTuple(), layer_2 = NamedTuple(), layer_3 = NamedTuple()))

julia> loss(stlayer,ps,x,y) = sum(abs2,stlayer(x,ps).-y)/length(y);

julia> stlayer = Lux.StatefulLuxLayer{true}(model,nothing,st);

julia> function gradloss(stlayer,ps,x,y)
       dps = Enzyme.make_zero(ps)
       _,res = Enzyme.autodiff(ReverseWithPrimal,loss,Active,Const(stlayer),Duplicated(ps,dps),Const(x),Const(y))
       return res,dps
       end
gradloss (generic function with 1 method)

julia> gradloss_comp = @compile gradloss(stlayer,ps,x,y)
Reactant.Compiler.Thunk{Symbol("##gradloss_reactant#471")}()

julia> gradloss_comp(stlayer,ps,x,y)
(fill(1.6686882f0), (layer_1 = (weight = Float32[-0.019157264 -0.052835472 -0.07052663; 0.0 0.0 0.0; … ; 0.008739114 0.001753656 0.00081436656; -0.28920138 -0.41064578 -0.3309787], bias = Float32[-0.08458641, 0.0, -0.49688935, 0.120346546, -0.1256084, 0.0, 0.0013638609, -0.47442067, 0.011482064, -0.60481024]), layer_2 = (weight = Float32[0.0 0.0 … -0.000109314075 -0.0012933938; 0.0 0.0 … 0.0 0.0; … ; -0.00888882 0.0 … 0.0 -0.08582606; 0.0001844522 0.0 … 0.0 7.595214f-5], bias = Float32[-0.04766386, 0.0, -0.0004994411, 0.0, -1.05122, 0.2795517, 0.26252118, -0.030470902, -0.10657983, 0.0051385043]), layer_3 = (weight = Float32[-0.56777835 0.0 … -0.072760016 -0.0005584934], bias = Float32[-2.335299])))

mofeing · 2024-10-27T14:35:42Z

yay, great!

We run Enzyme.autodiff in a different way (we run Enzyme through the MLIR, not the LLVM IR, so we use Enzyme as a MLIR dialect + pass) so we need to overlay the method with one of our own... so that's why you need to call Enzyme.autodiff inside the compiled function and not outside.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when precompiling in Windows #198

Error when precompiling in Windows #198

yolhan83 commented Oct 27, 2024

mofeing commented Oct 27, 2024

yolhan83 commented Oct 27, 2024

mofeing commented Oct 27, 2024

yolhan83 commented Oct 27, 2024 •

edited

Loading

mofeing commented Oct 27, 2024

Error when precompiling in Windows #198

Error when precompiling in Windows #198

Comments

yolhan83 commented Oct 27, 2024

mofeing commented Oct 27, 2024

yolhan83 commented Oct 27, 2024

mofeing commented Oct 27, 2024

yolhan83 commented Oct 27, 2024 • edited Loading

mofeing commented Oct 27, 2024

yolhan83 commented Oct 27, 2024 •

edited

Loading