https://github.com/oneflow-inc/oneflow-insiders
Repo to collect issues of OneFlow early adopters
https://github.com/oneflow-inc/oneflow-insiders
Last synced: 3 months ago
JSON representation
Repo to collect issues of OneFlow early adopters
- Host: GitHub
- URL: https://github.com/oneflow-inc/oneflow-insiders
- Owner: Oneflow-Inc
- Created: 2020-07-23T12:18:53.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-23T12:38:43.000Z (almost 6 years ago)
- Last Synced: 2025-01-01T18:27:43.907Z (over 1 year ago)
- Size: 3.91 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OneFlow Insiders
Repo to collect issues of OneFlow early adopters
# Troubleshooting
- `CUDNN_STATUS_NOT_INITIALIZED`
- You might see error message like this:
```
F0723 19:05:56.194067 40970 cuda_util.cpp:82] Check failed: error == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
```
- Please upgrade to Nvidia Linux x86_64 driver version >= 440.33
- For more information, please refer to [cuda compatibility docs](https://docs.nvidia.com/deploy/cuda-compatibility/index.html).
- Failed to compile `.cu` files
- Please refer to [CUDA System Requirements](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements) . Make sure your linux distribution and libraries shipped with it meet the requirements.
- If you are using tools like conda, please make sure libraries you install doesn't shade the proper installation comes with linux distribution or package management like apt-get.
- Please build OneFlow with a newer version of CMake. You could download version 3.14 from here: [https://github.com/Kitware/CMake/releases/download/v3.14.0/cmake-3.14.0-Linux-x86_64.tar.gz](https://github.com/Kitware/CMake/releases/download/v3.14.0/cmake-3.14.0-Linux-x86_64.tar.gz)
- How do I know what compilers and flags are used to compile OneFlow?
- run `make clean && make VERBOSE=1` to get exact compile commands with compiler path and flags
- How to compile OneFlow with RDMA support?
- add cmake flag `-DBUILD_RDMA` to compile OneFlow
- SWIG not found
- Usually you could install it with a package manager like apt-get. You can also build it from source. Refer to [SWIG official release](http://www.swig.org/download.html)
- Which version of g++ CMake is using to build OneFlow?
- You should find a line like this in CMake output:
```bash
-- CMAKE_CXX_COMPILER_VERSION: [YOUR G++ VERSION NUMBER]
```
- Failed to compile NCCL
- Try use less threads when compiling OneFlow third party. For instance, use
```bash
cmake -DTHIRD_PARTY=ON .. && make
```
instead of
```bash
cmake -DTHIRD_PARTY=ON .. && make -j$(nproc) `
```
- `"CUDA_VERSION" "VERSION_GREATER_EQUAL" "10.0"`
- Please use a newer version of CMake
- Make sure cmake is correctly included in `PATH`
- CUBLAS not found
- Usually it happens when using CUDA 10.1 or newer
- You should see error massage by CMake like this:
```
cuda lib not found: /usr/local/miniconda3/envs/dl/lib/libcublas_static.a or
/usr/local/cuda/lib64/libcublas_static.a
```
- Make sure `libcublas_static.a` is in one of the two directories.
- When running OneFlow in gdb, there is no debug information for code location.
- add cmake flag `-DCMAKE_BUILD_TYPE=RELWITHDEBINFO` or `-DCMAKE_BUILD_TYPE=DEBUG` and recompile
- `libof_ccobj.a: File truncated`
- You might see error message like this:
```
/usr/bin/ar: libof_ccobj.a: File truncated
make[2]: *** [libof_ccobj.a] Error 1
make[2]: *** Deleting file `libof_ccobj.a'
make[1]: *** [CMakeFiles/of_ccobj.dir/all] Error 2
make: *** [all] Error 2
```
- You should upgrade your GNU Binutils. Version 2.33.1 is recommended. If you are using conda, you could install it by running `conda install -c conda-forge binutils`
- failed to compile because C++ 17 is enabled
- In some cases, environment variable `CXXFLAGS` is not empty and contains `--std c++17`.
- Check if it is empty by running `echo $CXXFLAGS` and clear it with `unset CXXFLAGS`.
- cmake outputs error `No CMAKE_ASM_NASM_COMPILER could be found.`
- Install `nasm`. For instance, run `sudo yum install nasm` if you are on centos.
- `No module named 'google.protobuf'`
- You might see error message like this:
```
Scanning dependencies of target generate_api
...
from google.protobuf import descriptor as _descriptor
ModuleNotFoundError: No module named 'google.protobuf'
CMakeFiles/generate_api.dir/build.make:57: recipe for target 'CMakeFiles/generate_api' failed
make[2]: *** [CMakeFiles/generate_api] Error 1
```
- Install development dependencies by running:
```
pip3 install -r dev-requirements.txt
```
- get gdb warning `ptrace: Operation not permitted.` and gdb command `bt` prints no backtrace
- You might get this warning when debugging OneFlow with gdb inside a docker container. Try add these flags when launching your container:
```
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
```
- Please refer to https://stackoverflow.com/questions/19215177/how-to-solve-ptrace-operation-not-permitted-when-trying-to-attach-gdb-to-a-pro