https://github.com/siznax/honbasho
Grand Sumo Tournament Highlights Archive
https://github.com/siznax/honbasho
Last synced: 10 months ago
JSON representation
Grand Sumo Tournament Highlights Archive
- Host: GitHub
- URL: https://github.com/siznax/honbasho
- Owner: siznax
- Created: 2014-06-29T00:13:12.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-11-08T19:25:16.000Z (about 6 years ago)
- Last Synced: 2025-01-14T06:46:48.474Z (11 months ago)
- Language: Python
- Homepage: https://siznax.github.io/honbasho
- Size: 167 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
honbasho
========
Archive [Grand Sumo](http://www.sumo.or.jp/) tournament highlights,
as they are _removed_ :sob: before each new tournament.
Update the config
-----------------
Update ``basho.json`` with latest source and metadata, e.g.
```
"201609": {
"en": "http://www.sumo.or.jp/EnHonbashoTopicsKoTorikumi15/wrap",
"ja": "http://www.sumo.or.jp/ResultDataKoTorikumi15/wrap",
"date": "7 Oct 2016",
"title": "Aki 2016 (September) Grand Sumo Highlights",
"archive": "honbasho-201609-aki",
"description": "Aki 2016\n\nTokyo, Ryogoku Kokugikan\n\nSeptember 11, 2016 - September 25, 2016\n\n"
```
Crawl and download
------------------
Get highlights metadata:
```shell
$ mkdir {dest}
$ crawl.py {selector} > {dest}/data.json
```
Download movies and text:
```shell
$ download.py {dest} {dest}/data.json
```
Make highlights HTML index:
```shell
$ index.py {dest}/data.json {selector} > {dest}/highlights.html
```
Upload to the Internet Archive
------------------------------
* Add a description for the archive page in ``basho.json``
* Move crawl HTML out of {dest}/
* Make sure {selector} and {dest} have same name (e.g. 201607)
Review metadata changes to be made:
```shell
$ upload.py {selector}
```
Upload files and modify metadata:
```shell
$ upload.py {selector} -u # upload files
$ upload.py {selector} -m # modify metadata
```
Update project pages
--------------------
* Checkout `gh-pages` branch and update ``index.html``
* See https://siznax.github.io/honbasho
Thanks to the [Internet Archive](https://archive.org/) for hosting,
and @jjjake for the excellent
[internetarchive](https://github.com/jjjake/internetarchive)
python library.
@siznax