https://github.com/hpprc/jawiki
https://huggingface.co/datasets/hpprc/jawiki
https://github.com/hpprc/jawiki
Last synced: 3 months ago
JSON representation
https://huggingface.co/datasets/hpprc/jawiki
- Host: GitHub
- URL: https://github.com/hpprc/jawiki
- Owner: hppRC
- License: apache-2.0
- Created: 2024-02-02T04:33:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-02T13:12:06.000Z (over 1 year ago)
- Last Synced: 2025-03-24T09:52:56.746Z (3 months ago)
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# JaWiki
```
mkdir -p data/jawiki
wget -c https://dumps.wikimedia.org/other/enterprise_html/runs/20240101/jawiki-NS0-20240101-ENTERPRISE-HTML.json.tar.gz
tar -zxvf jawiki-NS0-20240101-ENTERPRISE-HTML.json.tar.gz -C data/jawiki
python main.py
```