An open API service indexing awesome lists of open source software.

https://github.com/majewsky/rust-jmdict

Rust library for access to the JMdict
https://github.com/majewsky/rust-jmdict

Last synced: about 2 months ago
JSON representation

Rust library for access to the JMdict

Awesome Lists containing this project

README

        

# WARNING: Licensing on database files

The database files compiled into the crate are licensed from the Electronic Dictionary Research and Development Group
under Creative Commons licenses. Applications linking this crate directly oder indirectly must display appropriate
copyright notices. Please refer to the [EDRDG's license statement](https://www.edrdg.org/edrdg/licence.html) for details.

# rust-jmdict

![GitHub Actions Badge](https://github.com/majewsky/rust-jmdict/actions/workflows/test.yml/badge.svg)

The `jmdict` crate contains the data from the [JMDict file](https://www.edrdg.org/jmdict/j_jmdict.html), a comprehensive
multilingual dictionary of the Japanese language. The original JMDict file, included in this repository (and hence, in
releases of this crate) comes as XML. Instead of stuffing the XML in the binary directly, this crate parses the XML at
compile-time and generates an optimized representation for inclusion in the final binary.

In short, this crate does:

* parse the XML structure of the JMdict database file,
* provide an API to access its entries, and
* provide compile-time flags (via Cargo features) to select the amount of information included in the binary.

This crate does NOT:

* provide fast lookup into the database. You get a list of entries and then you can build your own indexing on top as
required by your application.

For specific examples, please check out the [documentation on docs.rs](https://docs.rs/jmdict/).

## Building

When packaging to crates.io, we cannot include the actual payload data (`data/entrypack.json`) because crates.io has a
limit of 10 MiB per crate. (Technically, we could ship the data by depending on a series of data crates each slightly
under 10 MiB, but I intend to be a good citizen and not abuse the shared infrastructure of crates.io needlessly.)

Hence the default strategy is to pull the entrypack (a preprocessed form of the JMdict contents) at build time from
a server under the crate owner's control, currently . Each released crate version will
have the most recent entrypack (as of the time of publication) hardcoded into its code, along with a SHA-256 checksum to
ensure data integrity.

If downloading the entrypack at build time is not possible (e.g. because the build machine does not have internet
access, or because `curl` is not installed on the build machine), download the entrypack beforehand and put its path in
the `RUST_JMDICT_ENTRYPACK` environment variable when running `cargo build`.

For development purposes, when building from the repository, `data/entrypack.json` will be used instead. If this is not
desired, set the value of the `RUST_JMDICT_ENTRYPACK` to `default` to force the normal download behavior.

## Contributing

If you plan to open issues or write code, please have a look at [CONTRIBUTING.md](CONTRIBUTING.md).