https://github.com/arv-anshul/hockey
Scrape hockey data using scrapy with pydantic validation. Data available on Kaggle too.
https://github.com/arv-anshul/hockey
hockey hockey-data pydantic scrapy web-scraping
Last synced: 3 months ago
JSON representation
Scrape hockey data using scrapy with pydantic validation. Data available on Kaggle too.
- Host: GitHub
- URL: https://github.com/arv-anshul/hockey
- Owner: arv-anshul
- License: mit
- Created: 2025-01-22T22:20:51.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-02-01T06:08:00.000Z (5 months ago)
- Last Synced: 2025-02-01T07:19:25.689Z (5 months ago)
- Topics: hockey, hockey-data, pydantic, scrapy, web-scraping
- Language: Python
- Homepage: https://kaggle.com/datasets/arvanshul/hockey-india-league-2025
- Size: 39.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scrape Hockey Data
This project is used to scrape data related to **Hockey** from [altiusrt.com](https://altiusrt.com). I mainly focused on
[hockeyindia.altiusrt.com](https://hockeyindia.altiusrt.com) because I am interested in **Hockey India League** (HIL).For now, scraper is able scrape following data:
1. **Competitions:** Details about **previous, upcoming and inprogress** competitions. _Competitions are like a
tournament (eg. **Hockey India League**)._
2. **Competition Teams:** Details about teams participated in the competition.
3. **Competition Matches:** Details about specified competition's matches.
4. **Competition Players:** Details about players who will be playing the competition.
5. **Competition Matches** (detailed): A full detailed data around the match like umpires, players who goal, quater-wise
data and more.You can use [`altiusrt/main.py`](src/altiusrt/main.py) to scrape the data related to a specific competition (eg.
**HIL**) and export them into `json` and `jsonl` (aka `jsonlines`) data format.```bash
uv run python -m src.altiusrt.main 180
```> In above command, **`180`** is the `competition_id` for **Hockey India League** competition/tournament.
## Dataset on Kaggle
I have scraped data related to **HIL 2025** and uploaded
[on Kaggle](https://www.kaggle.com/datasets/arvanshul/hockey-india-league-2025), you can use that to create an awesome
dashboard out of it.## Acknowledgment
- Took help from
[@Martijn-van-Kekem-Development/hockey-match-calendar](https://github.com/Martijn-van-Kekem-Development/hockey-match-calendar)
repo for scraping codes like CSS selector, URL formation and more.