https://github.com/cu-sanjay/cricket-score-scraper

A Python-based web scraper that extracts live and past match data from the Women's Premier League (WPL) website using Selenium, BeautifulSoup, and GitHub Actions.
https://github.com/cu-sanjay/cricket-score-scraper

beautifulsoup cricket-score github-actions selenium sports-data web-scraping

Last synced: about 2 months ago
JSON representation

A Python-based web scraper that extracts live and past match data from the Women's Premier League (WPL) website using Selenium, BeautifulSoup, and GitHub Actions.

Host: GitHub
URL: https://github.com/cu-sanjay/cricket-score-scraper
Owner: cu-sanjay
License: mit
Created: 2025-02-21T11:17:15.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-08T08:19:28.000Z (over 1 year ago)
Last Synced: 2025-03-08T08:23:48.224Z (over 1 year ago)
Topics: beautifulsoup, cricket-score, github-actions, selenium, sports-data, web-scraping
Language: Python
Homepage: https://www.wplt20.com/
Size: 232 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# **Live Cricket Score Scraper**

A **Python-based web scraper** that fetches **live and past match data** from the **[Women's Premier League (WPL)](https://www.wplt20.com/) website** using **Selenium, BeautifulSoup, and GitHub Actions**. The script runs automatically every **Hour** and updates a JSON file with match details.
> [!IMPORTANT]
> This project is for **educational purposes only**. It does not store, redistribute, or claim ownership over any third-party data. Users are responsible for complying with website terms of service.

## **Features**

1. **Automated Web Scraping** – Uses **Selenium + BeautifulSoup** to extract match details dynamically.
2. **GitHub Actions Integration** – Runs automatically on schedule without manual execution.
3. **Web Development Ready** – Data is stored in **wpl_data.json**, which can be used in web applications.
4. **Bypass Restrictions** – Implements **headless browsing, user-agent rotation**, and **dynamic content handling**.

## **Usage**

### **Run Locally**

1. **Clone the repository**
```sh
git clone https://github.com/cu-sanjay/cricket-score-scraper
cd cricket-score-scraper
```

2. **Install dependencies**
```sh
pip install -r requirements.txt
```

3. **Run the script**
```sh
python test.py
```

### **Automated Execution via GitHub Actions**

- The script is scheduled to run every **1 hour** using **GitHub Actions**.
- It fetches live match data and commits changes automatically.
- No manual intervention is needed once set up.

## **Enhancements & Workarounds**

1. **Handling Strict Websites**
2. Rotate **user-agents** to prevent detection.
3. Use **headless browsing** for minimal footprint.
4. Simulate **human interactions** (scrolling, waiting, retries).
5. Extract data from **network requests** instead of the rendered page.
> [!TIP]
> **Web Development Integration**
> - Serve **wpl_data.json** via **Flask/Django API**.
> - Fetch and display match data in **React/Next.js frontend**.
> - Automate updates via **Telegram/Reddit/Discord bot**.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cu-sanjay/cricket-score-scraper

Awesome Lists containing this project

README