https://github.com/bdunnette/derby-name-scraper
https://github.com/bdunnette/derby-name-scraper
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bdunnette/derby-name-scraper
- Owner: bdunnette
- License: mit
- Created: 2022-07-07T16:50:49.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-06-06T23:53:44.000Z (about 2 years ago)
- Last Synced: 2024-06-07T00:17:42.548Z (about 2 years ago)
- Language: Jupyter Notebook
- Size: 20.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# derby-name-scraper
[](https://colab.research.google.com/github/bdunnette/derby-name-scraper/blob/master/derby_name_scraper.ipynb)
Scraping [derby names](https://en.wikipedia.org/wiki/Roller_derby#Derby_names) from publicly-accessible lists
## Description
Download roller derby names from public sources and save them for further analysis.
## Installation
### Prerequisites
- Python - available from [python.org](https://www.python.org/downloads/) or [Windows Store](https://apps.microsoft.com/detail/9nrwmjp3717k)
- [pipenv](https://pipenv.pypa.io/en/latest/installation.html)
### Clone the repository
```powershell
git clone https://github.com/bdunnette/derby-name-scraper.git
cd derby-name-scraper
```
### Install dependencies
```powershell
pipenv install
```
## Usage
Download the [WFTDA](https://wftda.com/), [RDR](https://rollerderbyroster.com/), and [DRC](http://www.derbyrollcall.com/) rosters and save them to the data directory:
```powershell
pipenv run python -m luigi --module name_scraper ScrapeWFTDA --local-scheduler
pipenv run python -m luigi --module name_scraper ScrapeRDR --local-scheduler
pipenv run python -m luigi --module name_scraper ScrapeDRC --local-scheduler
```
Combine the rosters into a single file:
```powershell
pipenv run python -m luigi --module name_scraper CombineNames --local-scheduler
```
Generate a list of unique names and numbers:
```powershell
pipenv run python -m luigi --module name_scraper NameList --local-scheduler
pipenv run python -m luigi --module name_scraper NumberList --local-scheduler
```
To generate an ASCII-only list of names:
```powershell
pipenv run python -m luigi --module name_scraper NameList --ascii-only --local-scheduler
```
To generate a tab-separated list of names with numbers:
```powershell
pipenv run python -m luigi --module name_scraper NameNumberList --local-scheduler
```