https://github.com/cu-sanjay/cricket-score-scraper
A Python-based web scraper that extracts live and past match data from the Women's Premier League (WPL) website using Selenium, BeautifulSoup, and GitHub Actions.
https://github.com/cu-sanjay/cricket-score-scraper
beautifulsoup cricket-score github-actions selenium sports-data web-scraping
Last synced: about 2 months ago
JSON representation
A Python-based web scraper that extracts live and past match data from the Women's Premier League (WPL) website using Selenium, BeautifulSoup, and GitHub Actions.
- Host: GitHub
- URL: https://github.com/cu-sanjay/cricket-score-scraper
- Owner: cu-sanjay
- License: mit
- Created: 2025-02-21T11:17:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-08T08:19:28.000Z (over 1 year ago)
- Last Synced: 2025-03-08T08:23:48.224Z (over 1 year ago)
- Topics: beautifulsoup, cricket-score, github-actions, selenium, sports-data, web-scraping
- Language: Python
- Homepage: https://www.wplt20.com/
- Size: 232 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **Live Cricket Score Scraper**
A **Python-based web scraper** that fetches **live and past match data** from the **[Women's Premier League (WPL)](https://www.wplt20.com/) website** using **Selenium, BeautifulSoup, and GitHub Actions**. The script runs automatically every **Hour** and updates a JSON file with match details.
> [!IMPORTANT]
> This project is for **educational purposes only**. It does not store, redistribute, or claim ownership over any third-party data. Users are responsible for complying with website terms of service.
## **Features**
1. **Automated Web Scraping** – Uses **Selenium + BeautifulSoup** to extract match details dynamically.
2. **GitHub Actions Integration** – Runs automatically on schedule without manual execution.
3. **Web Development Ready** – Data is stored in **wpl_data.json**, which can be used in web applications.
4. **Bypass Restrictions** – Implements **headless browsing, user-agent rotation**, and **dynamic content handling**.
## **Usage**
### **Run Locally**
1. **Clone the repository**
```sh
git clone https://github.com/cu-sanjay/cricket-score-scraper
cd cricket-score-scraper
```
2. **Install dependencies**
```sh
pip install -r requirements.txt
```
3. **Run the script**
```sh
python test.py
```
### **Automated Execution via GitHub Actions**
- The script is scheduled to run every **1 hour** using **GitHub Actions**.
- It fetches live match data and commits changes automatically.
- No manual intervention is needed once set up.
## **Enhancements & Workarounds**
1. **Handling Strict Websites**
2. Rotate **user-agents** to prevent detection.
3. Use **headless browsing** for minimal footprint.
4. Simulate **human interactions** (scrolling, waiting, retries).
5. Extract data from **network requests** instead of the rendered page.
> [!TIP]
> **Web Development Integration**
> - Serve **wpl_data.json** via **Flask/Django API**.
> - Fetch and display match data in **React/Next.js frontend**.
> - Automate updates via **Telegram/Reddit/Discord bot**.