Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stooone/chinese-wordlist-extractor
https://github.com/stooone/chinese-wordlist-extractor
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/stooone/chinese-wordlist-extractor
- Owner: stooone
- License: cc0-1.0
- Created: 2018-02-26T14:34:37.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-29T12:37:38.000Z (about 4 years ago)
- Last Synced: 2024-08-02T05:05:49.978Z (4 months ago)
- Language: Python
- Size: 3.33 MB
- Stars: 2
- Watchers: 2
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hackchinese - chinese-wordlist-extractor
README
# chinese-wordlist-extractor
Script to make word frequency list (CSV) from a text.
The script reads **input.txt** and writes **output.csv** with the following columns: **chinese word, frequency, pinyin, definition.**
You can use this frequency list to study the most common words from a text to be able to understand it more easily.
## Requirements
```
pip install tqdm
```## Tips
* if the imput has garbage (HTML, other language texts) it doesn't matter
* just mirror your favorite (wanna to understand website) then concatenate all the html files to an input.txt then you can learn the frequent words for that particular website
* run it on your book to read, or on the SRT of your movie## Credits
This script uses the CC-CEDICT dictionary from https://cc-cedict.org/wiki/ that is licensed under the Creative Commons Attribution-Share Alike 3.0 License.