Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding to the problem related to ninja... #33

Open
frostinassiky opened this issue Jul 18, 2019 · 6 comments
Open

Regarding to the problem related to ninja... #33

frostinassiky opened this issue Jul 18, 2019 · 6 comments

Comments

@frostinassiky
Copy link

Dear guys,

I also meet some issue about the ninja... Here is my understanding:

  • This project uses JIT coding style, which requires ninja building system.

  • Solution 1. To install set up the system, there are two ways.

    • apt-get install ninja-build. The cuda version in the system has to match the one used in conda env
    • conda install ninja or pip install ninja: does not work for me.
  • Solution 2 that I am using. To avoid ninja, write in the "ahead of time" is one possible solution.

    • Create a new file setup.py under models/hrnet/sync_bn/inplace_abn
    • Install the inplace_abn module by python setup.py install
    • Modify models/hrnet/sync_bn/inplace_abn/functions.py, import the module as _backend
@sunke123
Copy link
Member

@frostinassiky
Thanks for you help!

@zyxu1996
Copy link

@frostinassiky
Could you please give me a detailed description about the 2nd solution? I don't know what's that mean. Thank you

@frostinassiky
Copy link
Author

frostinassiky commented Aug 3, 2019

@xu13521090631
This is my setup file.

from os import path

_src_path = path.join(path.dirname(path.abspath(__file__)), "src")

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='inplace_abn_cpp_backend',
    ext_modules=[
        CUDAExtension(
            name='inplace_abn_cpp_backend',
            sources=[
              "src/inplace_abn.cpp",
              "src/inplace_abn_cpu.cpp",
              "src/inplace_abn_cuda.cu"
            ],
            extra_compile_args = {
                "cxx":["-O3"],
                'nvcc': ['--expt-extended-lambda']
            }
        )
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

In the function.py file, I create a new _backend by
import inplace_abn_cpp_backend as _backend

As a third solution, if you are not using multiple GPUs, the PyTorch batch normalization layer works well. Ahha, there is a new brach pytorch-v1.1.

@zyxu1996
Copy link

zyxu1996 commented Aug 5, 2019

@frostinassiky
Thank you for your reply. There are still some troubles, I have done all these steps, but still get some errors as follows.I guess it is the version of cuda, pytorch, ninja don't match, could you please tell me the version of these packages. Thank you!
`!! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 4.9 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 4.9 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/tools/train.py", line 248, in
main()
File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/tools/train.py", line 83, in main
logger.info(get_model_summary(model.cuda(), dump_input.cuda()))
File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/lib/utils/modelsummary.py", line 90, in get_model_summary
model(*input_tensors)
File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/data/xzy/new_hrnet/HRNet-Semantic-Segmentation/lib/models/seg_hrnet.py", line 408, in forward
x = self.conv1(x)
File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/omnisky/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663`

@Fiordarancio
Copy link

@frostinassiky
Thank you for your helpful workaround: I encountered problems with ninja too and still I am quite stuck with them. I borrowed your code for setup.py but when I execute it I get the following error:

building 'inplace_abn_cpp_backend' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/TH -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -I/home/ilaria/workspace/hrnenv/include/python3.6m -c src/inplace_abn.cpp -o build/temp.linux-x86_64-3.6/src/inplace_abn.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=inplace_abn_cpp_backend -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
src/inplace_abn.cpp: In function ‘void pybind11_init_inplace_abn_cpp_backend(pybind11::module&)’:
src/inplace_abn.cpp:70:69: error: no matching function for call to ‘pybind11::module::def(const char [9], <unresolved overloaded function type>, const char [36])’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
In file included from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/utils/pybind.h:6:0,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/python.h:12,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/extension.h:6,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:6,
                 from src/inplace_abn.cpp:1:
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note: candidate: template<class Func, class ... Extra> pybind11::module& pybind11::module::def(const char*, Func&&, const Extra& ...)
     module &def(const char *name_, Func &&f, const Extra& ... extra) {
             ^~~
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note:   template argument deduction/substitution failed:
src/inplace_abn.cpp:70:69: note:   couldn't deduce template parameter ‘Func’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Actually, this is the same complain that I get in the traceback when ninja is istalled (pretty the same of this previous issue). I sincerly cannot figure out how to solve the problem, since I haven't found decent solutions in related topics.

My environment:

  • torch 1.4.0
  • CUDA 10.1
  • ninja 1.9.0
  • gcc/g++ 7.4.0

I tried also to use downgraded versions of torch and CUDA, but it did not work. Any help would be appreciated!

@kaizen0890
Copy link

@frostinassiky
Thank you for your helpful workaround: I encountered problems with ninja too and still I am quite stuck with them. I borrowed your code for setup.py but when I execute it I get the following error:

building 'inplace_abn_cpp_backend' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/TH -I/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.6m -I/home/ilaria/workspace/hrnenv/include/python3.6m -c src/inplace_abn.cpp -o build/temp.linux-x86_64-3.6/src/inplace_abn.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=inplace_abn_cpp_backend -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
src/inplace_abn.cpp: In function ‘void pybind11_init_inplace_abn_cpp_backend(pybind11::module&)’:
src/inplace_abn.cpp:70:69: error: no matching function for call to ‘pybind11::module::def(const char [9], <unresolved overloaded function type>, const char [36])’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
In file included from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/utils/pybind.h:6:0,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/python.h:12,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/extension.h:6,
                 from /home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:6,
                 from src/inplace_abn.cpp:1:
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note: candidate: template<class Func, class ... Extra> pybind11::module& pybind11::module::def(const char*, Func&&, const Extra& ...)
     module &def(const char *name_, Func &&f, const Extra& ... extra) {
             ^~~
/home/ilaria/workspace/hrnenv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:810:13: note:   template argument deduction/substitution failed:
src/inplace_abn.cpp:70:69: note:   couldn't deduce template parameter ‘Func’
   m.def("backward", &backward, "Second part of backward computation");
                                                                     ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Actually, this is the same complain that I get in the traceback when ninja is istalled (pretty the same of this previous issue). I sincerly cannot figure out how to solve the problem, since I haven't found decent solutions in related topics.

My environment:

  • torch 1.4.0
  • CUDA 10.1
  • ninja 1.9.0
  • gcc/g++ 7.4.0

I tried also to use downgraded versions of torch and CUDA, but it did not work. Any help would be appreciated!

I found that this error relates to version matching problem, specifying between Torch and ninja version.
Therefore I solved this problem by finding ninja version which matching with Torch version.
My environment as show as below:
Ubuntu: 16.04
Gcc: 6.5.0
Python: 3.5.2
Torch: 0.4.1
ninja: 1.8.2
Hopefully that it can help you guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants