An open API service indexing awesome lists of open source software.

https://github.com/1computer1/sengenbango

Parallel JP-EN corpora search combining multiple corpora
https://github.com/1computer1/sengenbango

Last synced: about 2 months ago
JSON representation

Parallel JP-EN corpora search combining multiple corpora

Awesome Lists containing this project

README

        

# 千言万語

Parallel JP-EN corpora search combining multiple corpora. See [here](./SOURCES.md) for credits for data sources.

## Instructions

1. Parse the data into CSV files. See [here](data/README.md) for more instructions.

2. Configure [the compose file](./compose.yml) if needed.

3. Run `docker compose up db` to first set up the database if it hasn't already. See [here](database/README.md) for information about the database.
1. Once the database is up, run `docker compose exec -it db psql -U postgres` to get to the postgres console.
2. Run `call copy_data();` to copy the data. This can be done everytime there's an update to the data, clearing existing data first.
3. If not all sources should be copied, supply an array of sources e.g. `call copy(array['basics']);`.
4. If new sources were added, run `docker compose exec db psql -U postgres -f docker-entrypoint-initdb.d/01-init.sql` to recreate `copy_data`.

4. Run `docker compose up` for everything else.