https://github.com/idiap/buslr
BuSLR: Build System for Speech and Language Research
https://github.com/idiap/buslr
Last synced: 9 months ago
JSON representation
BuSLR: Build System for Speech and Language Research
- Host: GitHub
- URL: https://github.com/idiap/buslr
- Owner: idiap
- License: bsd-3-clause
- Created: 2019-09-23T07:44:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-12-04T09:09:57.000Z (over 5 years ago)
- Last Synced: 2025-08-04T07:57:11.873Z (11 months ago)
- Language: CMake
- Size: 108 KB
- Stars: 5
- Watchers: 7
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
# Build System for Learning Research
## Overview
BuSLR knows about various packages that are useful for machine learning (originally speech and language processing), and how to go get them and build them. BuSLR supports two build
systems:
- One is built around [cmake](https://cmake.org)'s
[ExternalProject](https://cmake.org/cmake/help/latest/module/ExternalProject.html)
package. It uses make dependencies to handle package dependencies.
- The other is built around [conda](http://conda.io). It functions as a
repository for conda's `meta.yaml` and `build.sh` metadata.
Not all packages are supported in both systems.
There is an aspect of balance between the wider [conda](http://conda.io)
infrastructure and the linux distribution packaging systems. In general, if
something is normally in a linux distribution (e.g.,
[sox](http://sox.sourceforge.net)) then there's no point handling it here. If
it's in `conda` then the same argument applies, but more subjectively.
[pytorch](https://pytorch.org) is better in conda,
[kaldi](http://kaldi-asr.org) perhaps not.
Also, with conda, bear in mind that [this](https://xkcd.com/1987/) is not a
joke; the thing marked "another PIP?" does exist.
## To use cmake
Clone the repo and do
```
cd buslr/local
cp Configure.example configure.sh # Edit if necessary
./configure.sh
make
```
The package is built in `local` and installed to `local` unless the appropriate
line in `configure.sh` is changed. You can set:
```
export PATH=/local/bin
```
to access the builds, or do `source /local/etc/buslrvars.sh` to set other appropriate variables too. Set the `INHIBIT` line in `configure.sh` to inhibit building of packages for which you might have a system version (typically `cuda` or `mkl`).
## To use conda
Clone the repo and do
```
cd buslr
conda build src/
```
As long as the `conda-bld` directory is on your channel list (it is indexed and
functions as a local channel), you can do this:
```
conda install
conda build purge
```
Many of the packages were initialised with this command
```
conda-skeleton pypi
```
It allows conda versions of PIP packages to be built, thus avoiding the problem
with muliple PIPs and conda being unaware of PIP.
## Some individual package instructions
* [HTS](src/hts/README) requires the HTK sources to be downloaded manually.
* SRILM also requires a manual download
* Some packages (festival, kaldi, SRILM) don't really support a `make install`.
See the in-place build section below.
## Guidelines for creating new packages
### Packages
There is a directory for each package. Typically there are only
`CMakeLists.txt` and `meta.yaml` files, but there can also be patched or whole
files to be copied into the tree. In the case of HTS and SRILM, the manually
downloaded files are placed there too.
### Patches
Following the man page for `patch`, patches can be generated by copying the
original file to `/.org`, modifying the file, then running
```
diff -Naur / /
```
This is typically run relative to a directory called
`package/package-prefix/src/package`. At patch time, cmake will cd to that
directory. The patch can be applied using
```
PATCH_COMMAND patch -p0 < ${CMAKE_CURRENT_SOURCE_DIR}/patch.txt
```
in the `CMakeLists.txt` file. A precedent for this is the [sctk](sctk)
package, which patches the installation directory of a deep makefile.
If there are multiple patched files, it's better to run it on a copy of the
whole directory. In this case, it will prepend a directory so we need `patch
-p1`.
If the package is git based then git can generate the patch using `git diff`.
It functions like the directory case, so `patch -p1`. However patching git
checkouts causes problems on updates; see `irstlm`.
Where a package doesn't even have a build system, a `cmake` file can be copied
directly into the tree. This approach is taken in `sph2pipe`.
### Installing using CMake
Some packages don't have an install step. The native CMake install can work
well in these cases. CMake's `install()` command actually writes things to a
file called `cmake_install.cmake`. The trick is to use this file as the
`INSTALL_COMMAND` for these cases. The simplest precedent is `libresample`.
So, define this:
```
set(CMAKE_INSTALL_SCRIPT ${CMAKE_CURRENT_BINARY_DIR}/cmake_install.cmake)
```
add this command
```
INSTALL_COMMAND ${CMAKE_COMMAND} -P ${CMAKE_INSTALL_SCRIPT}
```
and specify the files using `install(FILES DESTINATION )`.
### In-place builds
Some packages, notably `kaldi` and the `festvox` family, don't really support being installed. For these, we set `SOURCE_DIR` to something at top level (rather than buried in the `src` tree) and set `INSTALL_COMMAND true` to suppress installation. `true` here is the unix command that returns 1; empty strings don't survive the `BuSLR_Add` wrapper.