Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/johnynek/bazel-deps

Generate bazel dependencies for maven artifacts
https://github.com/johnynek/bazel-deps

Last synced: about 1 month ago
JSON representation

Generate bazel dependencies for maven artifacts

Awesome Lists containing this project

README

        

# bazel-deps

Generate [bazel](https://bazel.build/) dependencies transitively for maven artifacts, with scala
support.

## Quickstart

This repo can be cloned and built locally, or you can download pre-build binaries for MacOS and Linux in the releases page. Automatic releases are generated for every commit against master.
We also include a bash script in the releases which will let you easily download/run on mac/linux a default configuration for running bazel-deps.

A flow like:
1) Download the bash script paired with the release, it has the expected per platform sha256's embedded in it
2) Place in your repo and `chmod +x update_dependencies.sh`, maybe in a scripts folder if you wish.
3) Copy the `dependencies.yaml` from this repo, or write your own
4) Run the script, it should produce some files in `3rdparty`
5) Add to your workspace:
```python
load("//3rdparty:workspace.bzl", "maven_dependencies")
maven_dependencies()
load("//3rdparty:target_file.bzl", "build_external_workspace")
build_external_workspace(name = "third_party")
```
6) You can now add dependencies to your `BUILD` files as described [below](#alternate-outputs-external-repo).
eg `deps = ["@third_party//3rdparty/jvm/org/scalaj:scala_http"]`.

## Usage

First, list all of your maven artifact dependencies in a [Dependencies](#dependencies) file.

Next, run parseproject on your project yaml file. This will create a tree of BUILD files that
match the maven group id, and the artifact id will be a label in a BUILD file. You should not
edit these by hand, and instead have a separate directory for any exceptions that you manage
along with [Replacements](#replacements). For instance, this project is setup with:

```bash
./bazel run //:parse -- generate -r `pwd` -s 3rdparty/workspace.bzl -d dependencies.yaml
```

We give three arguments: the path to the file we will include in our workspace. The path to the root
of our bazel repo. The path to the dependencies file. You can also run with `--help`.

Then you should add
```
load("//3rdparty:workspace.bzl", "maven_dependencies")

maven_dependencies()
```
to your workspace to load the maven dependencies.

For example, if your project is located at `MY_PROJ_DIR`, your dependencies file is at
`MY_PROJ_DIR/dependencies.yaml`, and your checkout of bazel-deps is at `BAZEL_DEPS`, to generate the
dependencies you need to do the following:

```bash
cd $BAZEL_DEPS
./bazel run //:parse generate -- --repo-root "$MY_PROJ_DIR" --sha-file 3rdparty/workspace.bzl --deps dependencies.yaml
```

The final result in `MY_PROJ_DIR` will look like this

```
MY_PROJ_DIR
├── 3rdparty <-- everything under here is generated by running gen_maven_deps.sh
│ ├── workspace.bzl <-- load() this from main WORKSPACE
│ └── jvm/ <-- generated BUILD files in this directory.
├── BUILD
├── WORKSPACE
└── dependencies.yaml <-- your project's dependencies are declared in here.
```

Whenever you update the dependencies declared in dependencies.yaml you will need to regenerate the
contents of the `3rdparty` directory by re-running `$BAZEL_DEPS/gen_maven_deps.sh generate`.

### CI integration
In a CI, you will often want to make sure there is alignment between the configuration file for bazel-deps
and the resulting generated files or directories, you can run `generate` with `--check-only` and it will
check that each file matches bit-for-bit, but does not generate. If something does not match what would have
been generated, you get a non-zero return value and a list of the mismatches logged to error.

### Alternate outputs, external repo
Bazel-deps can also prepare the outputs, not as a file tree but an external repo. With this one would refer to targets as
`@third_party//foo:bar` rather than `//3rdparty/jvm/foo/bar`. This is useful if you do not want to check in generated code to your
repo. Also if multiple repos are depending upon one another and using bazel deps this can avoid broken transitive dependencies.
That is if there are two repos `A` and `B` where `B` depends on `A`:
where A has `Foo 1.0 dependson Jackson27`
and B has `Foo 2.0 depends on circe`
with the checked in version both will compile from source against the local copy of Foo, but transitively on the classpath
in the repo `B` `Jackson27` will be on the classpath rather than `circe`.

To use this option you would execute bazel-deps like:
```bash
cd $BAZEL_DEPS
./bazel run //:parse generate -- --repo-root "$MY_PROJ_DIR" --sha-file 3rdparty/workspace.bzl --deps dependencies.yaml --target-file 3rdparty/target_file.bzl --disable-3rdparty-in-repo
```

In your `dependencies.yaml` file you will likely want:
`thirdPartyDirectory: ""`
to avoid prefixing the remote repo path with `3rdparty/jvm`.

And finally to load it from your `WORKSPACE` you would use:
```
load("//3rdparty:target_file.bzl", "build_external_workspace")

build_external_workspace(name = "third_party")
```

### Customized integration
If you want to fully control how you create your third party dependencies, you can use bazel deps simply
to normalize all the jars into a single canonical version for each artifact and present a json lock file
which has the hashes and dependencies of each artifact. To do this you would do:
```
./bazel run //:parse -- generate -r `pwd` -d dependencies.yaml --resolved-output lock.json
```
The schema of the lock file should be rather obvious and it has all the information you would need.

## Assumptions and usage
This tool will generate one canonical version for every jar in the transitive dependencies of
the root dependencies declared. You have three conflict resolution modes currently (which currently
apply globally):

- fail: if more than one version is found transitively, fail.
- fixed: for all artifacts explicitly added, use that version, otherwise fail if any other artifact has multiple versions.
- highest: for all artifacts explicitly added, use that version, otherwise take the highest version.

In any case, we add a comment for any duplicates found in the workspace loading file.

To declare dependencies, add items to the `dependencies` key in your yaml file. The format
should be yaml or json. It should have [`dependencies`](#dependencies) and it may have [`replacements`](#replacements)
and [`options`](#options). Important: only dependencies explicitly named have public visibility,
transitive dependencies not listed in the dependencies file have visibility limited to the third
party directory.

### Dependencies

Dependencies are a map from maven group id to artifact id, with some metadata, such as:
```yaml
dependencies:
com.google.guava:
guava:
version: "18.0"
lang: java
```
Language is always required and may be one of `java, scala, scala/unmangled`. To control the scala
version, see the [Options section](#options). A common case are projects with many modules. For instance in
the [scalding project](https://github.com/twitter/scalding) there are many modules: `-core, -date,
-args, -db, -avro` to name a few. To reduce duplication you can do:

```yaml
dependencies:
com.twitter:
scalding:
version: 0.16.0
lang: scala
modules: [core, date, args, db, arvo]
```
The `version` field is optional. If it is absent, it means this jar is expected to be found by
transitive dependencies, and it is available to be used outside of the thirdparty directory, but the
exact version used can be selected according to the version resolution rules. It is an error to have
an unversioned dependency that is not a transitive dependency of another versioned dependency.

A target may optionally add `exports` and `exclude` lists to a dependency. `exports` should be just the group and
artifact (such as: `com.twitter:scalding-core` in the above), and they should be listed in the dependencies. `exclude`
list should also be only the group and artifact.

It's possible to add `generateNeverlink` option to a dependency, which will make the generator to generate this dependency twice:
1. With the normalized name as usual.
2. With the name `${normalized}_neverlink` and neverlink is set as true.
This option should be used only for java dependencies, it will be ignored in any other lang.

Each group id can only appear once, so you should collocate dependencies by group. WARNING the parsing library
we are using does not fail on duplicate keys, it just takes the last one, so watch out. It would be good
to fix that, but writing a new yaml parser is out of scope.

#### Packaging and Classifiers

Depending on artifacts with classifiers is straightforward: just add the packaging and classifier as part of the
artifact id:

```yaml
dependencies:
net.sf.json-lib:
json-lib:jar:jdk15: # artifact:packaging:classifier
lang: java
version: "2.4"
```

**Note**: Currently, only `jar` packaging is supported for dependencies. More work is needed on the `bazel-deps` backend
to ensure that non-jar dependencies are written as `data` attributes, instead of regular jar dependencies.

Excluding artifacts with packaging or classifiers is similar to including dependencies. Non-jar packaging _is_ supported
for `exclude`.

```yaml
com.amazonaws:
DynamoDBLocal:
lang: java
version: "1.11.86"
exclude:
- "com.almworks.sqlite4java:sqlite4java-win32-x86:dll"
- "com.almworks.sqlite4java:sqlite4java-win32-x64:dll"
- "com.almworks.sqlite4java:libsqlite4java-osx:dylib"
- "com.almworks.sqlite4java:libsqlite4java-linux-i386:so"
- "com.almworks.sqlite4java:libsqlite4java-linux-amd64:so"
```

#### Annotation Processors (`processorClasses`)

A target may also optionally add `processorClasses` to a dependency. This is for [annotation processors](https://docs.oracle.com/javase/8/docs/api/javax/annotation/processing/Processor.html).
`bazel-deps` will generate a `java_library` and a `java_plugin` for each annotation processor defined. For example, we can define Google's auto-value annotation processor via:
```yaml
dependencies:
com.google.auto.value:
auto-value:
version: "1.5"
lang: java
processorClasses: ["com.google.auto.value.processor.AutoValueProcessor"]
```
This will yield the following:
```python
java_library(
name = "auto_value",
exported_plugins = [
":auto_value_plugin",
],
visibility = [
"//visibility:public",
],
exports = [
"//external:jar/com/google/auto/value/auto_value",
],
)

java_plugin(
name = "auto_value_plugin",
processor_class = "com.google.auto.value.processor.AutoValueProcessor",
deps = [
"//external:jar/com/google/auto/value/auto_value",
],
)
```
If there is only a single `processorClasses` defined, the `java_plugin` rule is named `_plugin`. If there are multiple
`processorClasses` defined, each one is named `_plugin_`.

### Options
There are a number of ways to customize the generated build files. These are controlled
by the `options` dictionary at the root-level of the dependencies file. This is a list of
all of the supported options.

* buildHeader: usually you will want to configure your scala support here:
```yaml
buildHeader:
- load("@io_bazel_rules_scala//scala:scala_import.bzl", "scala_import")
```
* languages: an array of languages to be supported either Java or a specific version of Scala, e.g. `[ "java", "scala:2.12.8" ]`.
* thirdPartyDirectory: path to where we write the BUILD files for thirdparty. The default is `3rdparty/jvm`. If you choose the Google default of `third_party` you will need to configure the `licenses` option as well.
* versionConflictPolicy: `fixed`, `fail` or `highest`
* transitivity: `runtime_deps` or `exports`
* resolvers: the maven servers to use. Each resolver is defined by three keys, an "id", a "type", and a "url".
```yaml
resolvers:
- id: "mavencentral"
type: "default"
url: https://repo.maven.apache.org/maven2/
- id: "myserver"
type: "default"
url: https://my.private.maven.server.com/mvn/
```
* resolverCache: (with `resolverType: aether`) where bazel-deps should cache resolved packages. `local` (`target/local-repo` in the repository root)
or `bazel_output_base` (`bazel-deps/local-repo` inside the repository's Bazel output base -- from `bazel info
output_base`). Coursier ignores this option and uses `~/.cache/coursier`.
* namePrefix: a string added to the generated workspace names, to avoid conflicts. The external repository names and
binding targets of each dependency are prefixed.
* strictVisibility: this is enabled by default, when enabled a target must be explicitly declared in the
`dependencies.yaml` file or it will not be visible to the rest of the workspace. If it is set to `false` all targets
will be generated with `public` visibility.
* licenses: a set of strings added a licenses rule to each generated bazel target. Required by
bazel if your build targets are under `third_party/`. See the [licenses](https://docs.bazel.build/versions/master/be/functions.html#licenses) function in Bazel.
* resolverType: `aether` or `coursier`. Note that `aether` is slower and seems to silently miss some dependencies for
reasons we don't yet understand.
* buildFileName: filename of the generated build files

In the default case, with no options given, we use:
- allow java and scala `2.11`
- `3rdparty/jvm` as the `thirdPartyDirectory`.
- `highest` versionConflictPolicy
- `exports` transitivity
- use maven central as the resolver
- `local` resolverCache
- empty namePrefix (`""`)
- `coursier` resolverType
- `BUILD` as build file name

### Replacements
Some maven jars should not be used and instead are replaced by internal targets. Here are
some examples of this:

1. A subproject in the repo is published as a maven artifact (`A`). Others (`B`) depend on this artifact (`B -> A`) and in turn we depend on those (we have added `B` to our dependencies file). We don't want to pull `A` from a maven repo, since we build it internally, so we replace that artifact with an internal target.
2. We get some scala artifacts directly from the sdk. So, if a jar says it needs `org.scala-lang:scala-library` we already have that (and a few other jars) declared, and we don't want to risk having two potentially incompatible versions.
3. A small external project has both a bazel build and a maven publishing. We prefer to use the bazel build so we can easily pull more recent versions by bumping up a gitsha rather than waiting for jar to be published.

The replacements work on the level of artifacts. An artifact is replaced one-for-one with a local
bazel target. For instance:
```yaml
replacements:
org.scala-lang:
scala-library:
lang: scala/unmangled # scala-library is not mangled like sbt does with other jars
target: "@io_bazel_rules_scala_scala_library"
scala-reflect:
lang: scala/unmangled
target: "@io_bazel_rules_scala_scala_reflect"
```

In this way, we redirect maven deps to those providers.

Note, we stop walking the graph when we see a replaced node, so the replacement target is now
responsible for building correctly, and correctly exporting any dependencies that need to be
on the compile classpath.

## Code
This code was originally forked from [pgr0ss/bazel-deps](https://github.com/pgr0ss/bazel-deps)

This code was inspired by the [aether examples](https://github.com/eclipse/aether-demo/blob/322fa556494335faaf3ad3b7dbe8f89aaaf6222d/aether-demo-snippets/src/main/java/org/eclipse/aether/examples/GetDependencyTree.java) for walking maven dependencies.