https://github.com/1computer1/sengenbango
Parallel JP-EN corpora search combining multiple corpora
https://github.com/1computer1/sengenbango
Last synced: about 2 months ago
JSON representation
Parallel JP-EN corpora search combining multiple corpora
- Host: GitHub
- URL: https://github.com/1computer1/sengenbango
- Owner: 1Computer1
- License: agpl-3.0
- Created: 2023-09-20T12:57:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-13T22:29:52.000Z (over 1 year ago)
- Last Synced: 2025-02-17T21:14:07.561Z (3 months ago)
- Language: C
- Homepage: https://sgbg.onecomp.one
- Size: 438 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 千言万語
Parallel JP-EN corpora search combining multiple corpora. See [here](./SOURCES.md) for credits for data sources.
## Instructions
1. Parse the data into CSV files. See [here](data/README.md) for more instructions.
2. Configure [the compose file](./compose.yml) if needed.
3. Run `docker compose up db` to first set up the database if it hasn't already. See [here](database/README.md) for information about the database.
1. Once the database is up, run `docker compose exec -it db psql -U postgres` to get to the postgres console.
2. Run `call copy_data();` to copy the data. This can be done everytime there's an update to the data, clearing existing data first.
3. If not all sources should be copied, supply an array of sources e.g. `call copy(array['basics']);`.
4. If new sources were added, run `docker compose exec db psql -U postgres -f docker-entrypoint-initdb.d/01-init.sql` to recreate `copy_data`.4. Run `docker compose up` for everything else.