Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kinow/nihongo-scraper
Scraper to download Japanese news, quizzes, and other resources for use offline. Data is used for personal study only, and NLP is applied to isolate Kanji for reading cards, for example.
https://github.com/kinow/nihongo-scraper
education japanese language nihongo nlp python scraper scrapy
Last synced: 4 days ago
JSON representation
Scraper to download Japanese news, quizzes, and other resources for use offline. Data is used for personal study only, and NLP is applied to isolate Kanji for reading cards, for example.
- Host: GitHub
- URL: https://github.com/kinow/nihongo-scraper
- Owner: kinow
- License: mit
- Created: 2017-07-18T12:42:26.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-07-18T12:52:28.000Z (over 7 years ago)
- Last Synced: 2024-12-06T15:22:05.426Z (2 months ago)
- Topics: education, japanese, language, nihongo, nlp, python, scraper, scrapy
- Language: Python
- Size: 1.95 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Nihongo Scraper
Scraper to download Japanese news, quizzes, and other resources for use offline.
Data is used for personal study only, and NLP is applied to isolate Kanji for
reading cards, for example.* nihongo-spider simply scrapes a known site with quizzes and saves the response as JSON/CSV
URL's used are hidden, to prevent a mass of requests to all the sites, or bots following
links from GitHub.## Build
```
git clone https://github.com/kinow/nihongo-scraper.git
cd nihongo-scraper
pip install -r requirements
```## Execute nihongo-spider
```
cat > .env </context/path/
EOF
scrapy runspider nihongo-spider.py -o questions.json
```