Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/syhw/Broodwar_replays_scrappers
A collection of scrappers for replays sites
https://github.com/syhw/Broodwar_replays_scrappers
Last synced: 4 months ago
JSON representation
A collection of scrappers for replays sites
- Host: GitHub
- URL: https://github.com/syhw/Broodwar_replays_scrappers
- Owner: syhw
- Created: 2011-11-09T10:38:47.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2012-01-27T10:32:45.000Z (almost 13 years ago)
- Last Synced: 2024-04-08T05:35:05.781Z (7 months ago)
- Language: Python
- Homepage:
- Size: 97.7 KB
- Stars: 12
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-starcraftAI - [GitHub
README
Requirements
============
make
python
pyreplib (optional, for the verification/sorting/trashing part)Pipeline
========
Download replays
----------------
You need to download replays from replays sites, it can take a while
(2 days with the no leeching policy of ICCUP for instance):cd gosugamers|iccup|teamliquid && make
Unify replays
-------------
You may have downloaded the same replay numerous times, so we have
to unify them, we do that with a hash (sha256) of the file:cd unifier && vim unifier.py
# change to match the folders of the recently downloaded replays
makeVerify and sort replays
-----------------------
Some of the replays may be corrupted and should be trashed, some others
may be sorted in the wrong match-up folder. You will now verify and sort
them, for that, you need the pyreplib python library:cd match_ups && python verify_and_sort.py PATH_TO_REPLAY_FOLDER
In PATH_TO_REPLAY_FOLDER/trash, you have the corrupted replays, other valid
replays should be in their right match-up folder.Special case of ICCUP
=====================
ICCUP server has some kind of anti-leech policy, so the download script waits
for some time after each failure. Also in the iccup/crawl.py, you can change
(commented line) if you want to use the users replays, the current default is
to use only the gosus replays of ICCUP.Archive
=======
Do not forget about excluding all the trash your OS can put in:tar czf archive-name.tar.gz --exclude *.DS_Store replays/