Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/akalongman/geo-words
Georgian (ka_GE) word list
https://github.com/akalongman/geo-words
dictionary georgia georgian ka-ge spell spell-checker words
Last synced: 11 days ago
JSON representation
Georgian (ka_GE) word list
- Host: GitHub
- URL: https://github.com/akalongman/geo-words
- Owner: akalongman
- License: mit
- Created: 2017-09-23T21:45:13.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-01-25T17:57:50.000Z (almost 2 years ago)
- Last Synced: 2024-10-12T10:44:22.418Z (28 days ago)
- Topics: dictionary, georgia, georgian, ka-ge, spell, spell-checker, words
- Language: PHP
- Size: 2.89 MB
- Stars: 18
- Watchers: 2
- Forks: 3
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-georgia - akalongman/geo-words - Georgian words database (txt, dic, sql) + CLI web crawler on PHP (Categories)
README
# Georgian (ka_GE) word list
Download in:
[DIC](https://github.com/akalongman/geo-words/raw/master/dictionary/dic/ka_GE.dic) |
[TXT](https://github.com/akalongman/geo-words/raw/master/dictionary/txt/ka_GE.txt) |
[SQL](https://github.com/akalongman/geo-words/raw/master/dictionary/sql/ka_GE.sql)## Data sources
- Kevin Scannell (http://crubadan.org/languages/ka, CC-BY 4.0)
- National Parliamentary Library of Georgia (http://www.nplg.gov.ge/gwdict/index.php)
- Other Georgian eBooks/websites ([Crawler](#crawler))## Crawler
Crawler is written on PHP and uses MySQL as a database. Code placed under `crawler` folder.
Before running the script should be configured the database and run migrations.
First rename the file `.env.example` to `.env` and specify database credentials.
Install composer dependencies:
composer install
And run migrations:
composer migrate
## Usage
### Crawl links with `internal` profile
This command will crawl urls only inside specified domain and ignore external urlsphp cmd crawl --project-name="My Project" --profile=internal "http://www.nplg.gov.ge/gwdict/index.php"
### Crawl links with `all` profile
This command will crawl all linksphp cmd crawl --project-name="My Project" --profile=all "http://www.nplg.gov.ge/gwdict/index.php"
### Crawl links with `domain` profile
This command will crawl links with all domains, which end with `--domain`php cmd crawl --project-name="My Project" --profile=domain --domain=.ge "http://www.nplg.gov.ge/gwdict/index.php"
Will be crawled links, where url's domain ends with `.ge` suffix
### Crawl links with `subset` profile
This command will crawl all urls if link starts with `--subset`php cmd crawl --project-name="My Project" --profile=subset --subset="http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1" "http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1"
Will be crawled links, where url starts with `http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1` prefix
### Continue project
You can continue stopped project by commandphp cmd crawl --project-id={id}
Show all possible options: `php cmd help crawl`
## TODO
- Fix wrong entries and add more words
- Add tests
- Add notification sending on complete## License
Please see the [LICENSE](LICENSE.md) included in this repository for a full copy of the MIT license,
which this project is licensed under.