Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chakki-works/chakin
Simple downloader for pre-trained word vectors
https://github.com/chakki-works/chakin
datasets machine-learning natural-language-processing word-embeddings word-vectors
Last synced: 2 months ago
JSON representation
Simple downloader for pre-trained word vectors
- Host: GitHub
- URL: https://github.com/chakki-works/chakin
- Owner: chakki-works
- License: mit
- Created: 2017-05-19T03:40:25.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-06-21T21:11:46.000Z (almost 2 years ago)
- Last Synced: 2024-02-21T02:22:39.775Z (4 months ago)
- Topics: datasets, machine-learning, natural-language-processing, word-embeddings, word-vectors
- Language: Python
- Homepage: https://medium.com/chakki/simple-downloader-for-public-word-embeddings-fdbd3ce7ba5b
- Size: 172 KB
- Stars: 332
- Watchers: 18
- Forks: 49
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-embedding-models - chakin
README
# chakin
**chakin** is a downloader for pre-trained word vectors. [Supported many vectors](#supported-vectors)This library lets you download pre-trained word vectors without troublesome work.
-----------------
# Installation
To install chakin, simply:```shell
$ pip install chakin
```# Usage
You can download pre-trained word vectors as follows:```shell
$ python
``````python
>>> import chakin
>>> chakin.search(lang='English')
Name Dimension Corpus VocabularySize
2 fastText(en) 300 Wikipedia 2.5M
11 GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K
12 GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K
13 GloVe.6B.200d 200 Wikipedia+Gigaword 5 (6B) 400K
14 GloVe.6B.300d 300 Wikipedia+Gigaword 5 (6B) 400K
15 GloVe.42B.300d 300 Common Crawl(42B) 1.9M
16 GloVe.840B.300d 300 Common Crawl(840B) 2.2M
17 GloVe.Twitter.25d 25 Twitter(27B) 1.2M
18 GloVe.Twitter.50d 50 Twitter(27B) 1.2M
19 GloVe.Twitter.100d 100 Twitter(27B) 1.2M
20 GloVe.Twitter.200d 200 Twitter(27B) 1.2M
21 word2vec.GoogleNews 300 Google News(100B) 3.0M>>> chakin.download(number=2, save_dir='./') # select fastText(en)
Test: 100% || | Time: 0:00:02 60.7 MiB/s
'./wiki.en.vec'
```# Supported vectors
So far, chakin supports following word vectors:| Name | Dimension | Corpus | VocabularySize | Method | Language |
|---------------------|-----------|---------------------------|----------------|----------|------------|
| fastText(ar) | 300 | Wikipedia | 610K | fastText | Arabic |
| fastText(de) | 300 | Wikipedia | 2.3M | fastText | German |
| fastText(en) | 300 | Wikipedia | 2.5M | fastText | English |
| fastText(es) | 300 | Wikipedia | 985K | fastText | Spanish |
| fastText(fr) | 300 | Wikipedia | 1.2M | fastText | French |
| fastText(it) | 300 | Wikipedia | 871K | fastText | Italian |
| fastText(ja) | 300 | Wikipedia | 580K | fastText | Japanese |
| fastText(ko) | 300 | Wikipedia | 880K | fastText | Korean |
| fastText(pt) | 300 | Wikipedia | 592K | fastText | Portuguese |
| fastText(ru) | 300 | Wikipedia | 1.9M | fastText | Russian |
| fastText(zh) | 300 | Wikipedia | 330K | fastText | Chinese |
| GloVe.6B.50d | 50 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
| GloVe.6B.100d | 100 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
| GloVe.6B.200d | 200 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
| GloVe.6B.300d | 300 | Wikipedia+Gigaword 5 (6B) | 400K | GloVe | English |
| GloVe.42B.300d | 300 | Common Crawl(42B) | 1.9M | GloVe | English |
| GloVe.840B.300d | 300 | Common Crawl(840B) | 2.2M | GloVe | English |
| GloVe.Twitter.25d | 25 | Twitter(27B) | 1.2M | GloVe | English |
| GloVe.Twitter.50d | 50 | Twitter(27B) | 1.2M | GloVe | English |
| GloVe.Twitter.100d | 100 | Twitter(27B) | 1.2M | GloVe | English |
| GloVe.Twitter.200d | 200 | Twitter(27B) | 1.2M | GloVe | English |
| word2vec.GoogleNews | 300 | Google News(100B) | 3.0M | word2vec | English |
| word2vec.Wiki-NEologd.50d | 50 | Wikipedia | 335K | word2vec + NEologd | Japanese |