Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zseder/hundict
bilingual dictionary extractor from parallel corpora
https://github.com/zseder/hundict
Last synced: 29 days ago
JSON representation
bilingual dictionary extractor from parallel corpora
- Host: GitHub
- URL: https://github.com/zseder/hundict
- Owner: zseder
- Created: 2012-05-10T11:49:42.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2014-07-03T10:58:13.000Z (over 10 years ago)
- Last Synced: 2024-08-04T04:07:06.524Z (4 months ago)
- Language: Python
- Homepage:
- Size: 303 KB
- Stars: 21
- Watchers: 5
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
- low-resource-languages - hundict - bilingual dictionary extractor from parallel corpora. (Software / Utilities)
README
hundict is an experimental python project, that creates bilingual dictionary
from parallel corpora
Features (planned or done):
- easy to use (see hundict -h)
- fast (python fast, of course, not C fast)
- unigram pairs
- A - B
- ngram-ngram extraction, not only unigram-unigram
- ABC - DE
- multiple choice pairs
- (A or B) - C
- stopword remove
- remaining corpora print