{"id":22927698,"url":"https://github.com/kpym/frequencydictionaries","last_synced_at":"2026-01-11T01:35:55.029Z","repository":{"id":62173945,"uuid":"154812231","full_name":"kpym/FrequencyDictionaries","owner":"kpym","description":"Frequency dictionaries - one word per line simple text files","archived":false,"fork":false,"pushed_at":"2024-11-17T17:40:18.000Z","size":27251,"stargazers_count":41,"open_issues_count":0,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-07T10:15:49.866Z","etag":null,"topics":["dictionary","frequency-dictionary","plaintext"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kpym.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-26T09:42:39.000Z","updated_at":"2025-01-31T18:31:27.000Z","dependencies_parsed_at":"2022-10-27T20:01:02.034Z","dependency_job_id":null,"html_url":"https://github.com/kpym/FrequencyDictionaries","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpym%2FFrequencyDictionaries","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpym%2FFrequencyDictionaries/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpym%2FFrequencyDictionaries/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpym%2FFrequencyDictionaries/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kpym","download_url":"https://codeload.github.com/kpym/FrequencyDictionaries/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246667180,"owners_count":20814678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dictionary","frequency-dictionary","plaintext"],"created_at":"2024-12-14T09:15:44.234Z","updated_at":"2026-01-11T01:35:54.985Z","avatar_url":"https://github.com/kpym.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# [FrequencyDictionaries](https://github.com/kpym/FrequencyDictionaries)\r\n\r\nThis repository contains frequency dictionaries in the form of text files, with one word per line.\r\n\r\nThe repository is organized into two folders:\r\n- `freq_dicts_dirty`: Contains dictionaries with words that may not appear in a \"standard\" dictionary.\r\n- `freq_dicts_clean`: Contains dictionaries that have been cleaned and supplemented to include only words found in a \"standard\" dictionary.\r\n\r\n## `freq_dicts_dirty`\r\n\r\nThe files in this folder were derived from the [LuminosoInsight/wordfreq](https://github.com/LuminosoInsight/wordfreq) project. These dictionaries were converted into `.txt` files with one word per line, ordered by frequency (most frequent words come first). Only words longer than two characters were retained.\r\n\r\nThe conversion process involved:\r\n1. Using the [jakm/msgpack-cli](https://github.com/jakm/msgpack-cli) tool to convert `.msgpack` files to `.json` format.\r\n2. Transforming the `.json` files into `.txt` files with one word per line using `sed` and `grep`.\r\n\r\n## `freq_dicts_clean`\r\n\r\nThe files in this folder were created by cleaning the dictionaries in the `freq_dicts_dirty` folder. This process involved removing words not found in the corresponding dictionaries from [titoBouzout/Dictionaries](https://github.com/titoBouzout/Dictionaries).\r\n\r\n### File Naming Conventions\r\n- Files named `short_xx.txt` retain their original names.\r\n- Files originally named `long_xx.txt` have been renamed to `medium_xx.txt`.\r\n- New `long_xx.txt` files are created from `medium_xx.txt` (or `short_xx.txt` when applicable). These are supplemented by appending, in alphabetical order, all words present in the \"standard\" dictionary but absent from the \"frequency\" dictionary.\r\n\r\n## Licensing\r\n\r\nThis repository is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for details.\r\n\r\n### Attribution and Data Licensing\r\n\r\nThis repository is based on two primary sources:\r\n\r\n1. The [`rspeer/wordfreq`](https://github.com/rspeer/wordfreq) project by Robyn Speer.\r\n2. Dictionaries from the [`titoBouzout/Dictionaries`](https://github.com/titoBouzout/Dictionaries) repository, originally derived from the OpenOffice dictionary list.\r\n\r\n#### Wordfreq\r\n- Robyn Speer must be credited as specified in [NOTICE.md](NOTICE.md).\r\n- For a detailed list of data sources and their licenses, see the original `//wordfreq` [`NOTICE.md`](https://github.com/rspeer/wordfreq/blob/master/NOTICE.md).\r\n- Data from `wordfreq/wordfreq` is redistributed under terms compatible with their original licenses, including the Creative Commons Attribution-ShareAlike 4.0 license.\r\n\r\n#### Dictionaries\r\n- The dictionaries included in this repository are derived from the OpenOffice dictionary list, as referenced in [`titoBouzout/Dictionaries`](https://github.com/titoBouzout/Dictionaries).\r\n- While no formal license is provided in the source, credits to the original contributors are acknowledged in the respective `LANG.txt` files in the `titoBouzout/Dictionaries` repository.\r\n- For more details about the dictionaries' origins and attribution requirements, see [NOTICE.md](NOTICE.md).\r\n\r\n### Summary of Licensing\r\n\r\nThe combined content of this repository complies with the terms of the Apache License 2.0 and respects the attribution requirements of the original sources. See [NOTICE.md](NOTICE.md) for further details.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpym%2Ffrequencydictionaries","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkpym%2Ffrequencydictionaries","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpym%2Ffrequencydictionaries/lists"}