{"id":19544348,"url":"https://github.com/chakki-works/chakin","last_synced_at":"2025-04-04T22:06:43.371Z","repository":{"id":21109845,"uuid":"91762603","full_name":"chakki-works/chakin","owner":"chakki-works","description":"Simple downloader for pre-trained word vectors","archived":false,"fork":false,"pushed_at":"2022-06-21T21:11:46.000Z","size":176,"stargazers_count":333,"open_issues_count":9,"forks_count":48,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-03-28T21:06:00.502Z","etag":null,"topics":["datasets","machine-learning","natural-language-processing","word-embeddings","word-vectors"],"latest_commit_sha":null,"homepage":"https://medium.com/chakki/simple-downloader-for-public-word-embeddings-fdbd3ce7ba5b","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chakki-works.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-19T03:40:25.000Z","updated_at":"2025-01-19T13:16:05.000Z","dependencies_parsed_at":"2022-09-15T22:31:16.069Z","dependency_job_id":null,"html_url":"https://github.com/chakki-works/chakin","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chakki-works%2Fchakin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chakki-works%2Fchakin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chakki-works%2Fchakin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chakki-works%2Fchakin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chakki-works","download_url":"https://codeload.github.com/chakki-works/chakin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247256112,"owners_count":20909240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datasets","machine-learning","natural-language-processing","word-embeddings","word-vectors"],"created_at":"2024-11-11T03:27:50.718Z","updated_at":"2025-04-04T22:06:43.346Z","avatar_url":"https://github.com/chakki-works.png","language":"Python","funding_links":[],"categories":["Datasets"],"sub_categories":["Pre-Trained Word Vectors"],"readme":"# chakin\n**chakin** is a downloader for pre-trained word vectors. [Supported many vectors](#supported-vectors)\n\nThis library lets you download pre-trained word vectors without troublesome work.\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/chakki-works/chakin/blob/master/docs/top.jpg?raw=true\"\u003e\u003cbr\u003e\n\u003c/div\u003e\n\n-----------------\n\n\u003c!--\nWord vectors are very important for many natural language processing tasks such as document classification, \nnamed entity recognition, question answering and so on. \nIn such tasks, you can use the pre-trained word vectors  many people have published.\nBut it is troublesome that you find and download them by yourself. \n\n--\u003e\n\n\n# Installation\nTo install chakin, simply:\n\n```shell\n$ pip install chakin\n```\n\n# Usage\nYou can download pre-trained word vectors as follows:\n\n```shell\n$ python\n```\n\n```python\n\u003e\u003e\u003e import chakin\n\u003e\u003e\u003e chakin.search(lang='English')\n                   Name  Dimension                     Corpus VocabularySize  \n2          fastText(en)        300                  Wikipedia           2.5M   \n11         GloVe.6B.50d         50  Wikipedia+Gigaword 5 (6B)           400K   \n12        GloVe.6B.100d        100  Wikipedia+Gigaword 5 (6B)           400K   \n13        GloVe.6B.200d        200  Wikipedia+Gigaword 5 (6B)           400K   \n14        GloVe.6B.300d        300  Wikipedia+Gigaword 5 (6B)           400K   \n15       GloVe.42B.300d        300          Common Crawl(42B)           1.9M   \n16      GloVe.840B.300d        300         Common Crawl(840B)           2.2M   \n17    GloVe.Twitter.25d         25               Twitter(27B)           1.2M   \n18    GloVe.Twitter.50d         50               Twitter(27B)           1.2M   \n19   GloVe.Twitter.100d        100               Twitter(27B)           1.2M   \n20   GloVe.Twitter.200d        200               Twitter(27B)           1.2M   \n21  word2vec.GoogleNews        300          Google News(100B)           3.0M \n\n\u003e\u003e\u003e chakin.download(number=2, save_dir='./') # select fastText(en)\nTest: 100% ||               | Time: 0:00:02  60.7 MiB/s\n'./wiki.en.vec'\n```\n\n# Supported vectors\nSo far, chakin supports following word vectors:\n\n| Name                | Dimension | Corpus                    | VocabularySize | Method   | Language   |\n|---------------------|-----------|---------------------------|----------------|----------|------------|\n| fastText(ar)        | 300       | Wikipedia                 | 610K           | fastText | Arabic     |\n| fastText(de)        | 300       | Wikipedia                 | 2.3M           | fastText | German     |\n| fastText(en)        | 300       | Wikipedia                 | 2.5M           | fastText | English    |\n| fastText(es)        | 300       | Wikipedia                 | 985K           | fastText | Spanish    |\n| fastText(fr)        | 300       | Wikipedia                 | 1.2M           | fastText | French     |\n| fastText(it)        | 300       | Wikipedia                 | 871K           | fastText | Italian    |\n| fastText(ja)        | 300       | Wikipedia                 | 580K           | fastText | Japanese   |\n| fastText(ko)        | 300       | Wikipedia                 | 880K           | fastText | Korean     |\n| fastText(pt)        | 300       | Wikipedia                 | 592K           | fastText | Portuguese |\n| fastText(ru)        | 300       | Wikipedia                 | 1.9M           | fastText | Russian    |\n| fastText(zh)        | 300       | Wikipedia                 | 330K           | fastText | Chinese    |\n| GloVe.6B.50d        | 50        | Wikipedia+Gigaword 5 (6B) | 400K           | GloVe    | English    |\n| GloVe.6B.100d       | 100       | Wikipedia+Gigaword 5 (6B) | 400K           | GloVe    | English    |\n| GloVe.6B.200d       | 200       | Wikipedia+Gigaword 5 (6B) | 400K           | GloVe    | English    |\n| GloVe.6B.300d       | 300       | Wikipedia+Gigaword 5 (6B) | 400K           | GloVe    | English    |\n| GloVe.42B.300d      | 300       | Common Crawl(42B)         | 1.9M           | GloVe    | English    |\n| GloVe.840B.300d     | 300       | Common Crawl(840B)        | 2.2M           | GloVe    | English    |\n| GloVe.Twitter.25d   | 25        | Twitter(27B)              | 1.2M           | GloVe    | English    |\n| GloVe.Twitter.50d   | 50        | Twitter(27B)              | 1.2M           | GloVe    | English    |\n| GloVe.Twitter.100d  | 100       | Twitter(27B)              | 1.2M           | GloVe    | English    |\n| GloVe.Twitter.200d  | 200       | Twitter(27B)              | 1.2M           | GloVe    | English    |\n| word2vec.GoogleNews | 300       | Google News(100B)         | 3.0M           | word2vec | English    |\n| word2vec.Wiki-NEologd.50d | 50  | Wikipedia                 | 335K           | word2vec + NEologd | Japanese |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchakki-works%2Fchakin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchakki-works%2Fchakin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchakki-works%2Fchakin/lists"}