https://github.com/jdx/mise-java
A JVM Data Crawler
https://github.com/jdx/mise-java
cli crawler jvm
Last synced: 3 months ago
JSON representation
A JVM Data Crawler
- Host: GitHub
- URL: https://github.com/jdx/mise-java
- Owner: jdx
- License: mit
- Created: 2025-02-24T14:54:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-11T22:43:26.000Z (3 months ago)
- Last Synced: 2026-03-12T03:41:25.043Z (3 months ago)
- Topics: cli, crawler, jvm
- Language: Rust
- Homepage:
- Size: 134 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Roast a JVM Data Crawler


Roast is a data crawler that collects and stores information about JVM distributions from various vendors. The project
is heavily based on the [Java Metadata](https://github.com/joschi/java-metadata) project.
Supported distributions:
* [Alibaba Dragonwell](https://cn.aliyun.com/product/dragonwell)
* [Amazon Corretto](https://aws.amazon.com/corretto/)
* [Azul Zulu](https://www.azul.com/downloads/)
* [Bellsoft Liberica](https://bell-sw.com/pages/downloads)
* [Bellsoft Liberica Native Image Kit](https://bell-sw.com/pages/downloads/native-image-kit/)
* [Eclipse Temurin](https://adoptium.net/)
* [GraalVM Community Edition](https://www.graalvm.org/)
* [IBM Semeru](https://developer.ibm.com/languages/java/semeru-runtimes/)
* [JetBrains Runtime](https://github.com/JetBrains/JetBrainsRuntime/)
* [Mandrel](https://github.com/graalvm/mandrel)
* [Microsoft OpenJDK](https://www.microsoft.com/openjdk)
* [OpenJDK](https://jdk.java.net/)
* [Oracle JDK](https://www.oracle.com/java/)
* [Oracle GraalVM](https://www.graalvm.org/)
* [RedHat](https://developers.redhat.com/products/openjdk/)
* [SAP SapMachine](https://sap.github.io/SapMachine/)
* [Tencent Kona JDK](https://www.tencentcloud.com/document/product/845/48051)
* [Trava OpenJDK](https://github.com/TravaOpenJDK/)
## Schema
The API schema can be found at [mise-java.jdx.dev](https://mise-java.jdx.dev).
## Build & Run
### Create and initialize the database
#### Local Docker PostgreSQL
Assuming you have a PostgreSQL container `postgres` running with a user `postgres`.
```bash
docker exec -i -u postgres postgres psql -d postgres -c "DROP DATABASE roast;"
docker exec -i -u postgres postgres psql -d postgres -c "CREATE DATABASE roast;"
docker exec -i -u postgres postgres psql -d roast -c "CREATE USER roast WITH PASSWORD 'roast';"
docker exec -i -u postgres postgres psql -d roast < ./sql/schema.sql
```
## Run
### Environment variables
Roast uses a configuration file `config.toml` to configure the database connection and other settings.
You can use the following environment variables to override the default configuration in `config.toml`.
| Variable name | Description |
| -------------------------- | -------------------------------------------- |
| `ROAST_DATABASE_POOL_SIZE` | Number of threads to use for fetching data |
| `ROAST_DATABASE_URL` | PostgreSQL connection string |
| `ROAST_DATABASE_SSL_MODE` | SSL mode for PostgreSQL connection |
| `ROAST_DATABASE_SSL_CA` | CA certificate for PostgreSQL connection |
| `ROAST_DATABASE_SSL_CERT` | Client certificate for PostgreSQL connection |
| `ROAST_DATABASE_SSL_KEY` | Client key for PostgreSQL connection |
| `ROAST_EXPORT_PATH` | Export path for the data |
Additionally, you can set the following environment variables to configure the logging and threading.
| Variable name | Description |
| ------------------- | --------------------------------------------------------------------- |
| `RAYON_NUM_THREADS` | Number of threads to use by the Rayon module |
| `RUST_LOG` | Log configuration (see https://docs.rs/env_logger/latest/env_logger/) |
### Fetch data from all vendors
```bash
env \
RAYON_NUM_THREADS=50 \
RUST_LOG=roast=INFO \
cargo run -- fetch 2>&1 | tee -a error.log
```
### Export data by release_type
```bash
env \
RAYON_NUM_THREADS=50 \
RUST_LOG=roast=INFO \
ROAST_EXPORT_PATH=data/releasetype/ \
cargo run -- export release-type 2>&1 | tee -a error.log
```
### Export data by vendor
```bash
env \
RAYON_NUM_THREADS=50 \
RUST_LOG=roast=INFO \
ROAST_EXPORT_PATH=data/vendor/ \
cargo run -- export vendor 2>&1 | tee -a error.log
```
## Disclaimer
This project is in no way affiliated with any of the companies or projects offering and distributing the actual JREs and JDKs.
All respective copyrights and trademarks are theirs.