https://github.com/alfonsokan/pba_analytics
Streamlit app that provides advanced basketball statistics for the Philippine Basketball Association (PBA) 2023-2024 Season (Philippine Cup)
https://github.com/alfonsokan/pba_analytics
pandas streamlit webscraping
Last synced: about 2 months ago
JSON representation
Streamlit app that provides advanced basketball statistics for the Philippine Basketball Association (PBA) 2023-2024 Season (Philippine Cup)
- Host: GitHub
- URL: https://github.com/alfonsokan/pba_analytics
- Owner: alfonsokan
- Created: 2024-09-04T14:00:02.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-05T07:10:04.000Z (almost 2 years ago)
- Last Synced: 2025-07-19T22:39:47.470Z (11 months ago)
- Topics: pandas, streamlit, webscraping
- Language: Jupyter Notebook
- Homepage: https://pba-analytics.streamlit.app/
- Size: 104 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PBA Analytics App
This Streamlit app provides comprehensive analytics for the Philippine Basketball Association (PBA), covering a wide range of statistics. The data was meticulously web-scraped from basketball.realgm.com and processed using the Pandas library to enable dynamic filtering and analysis.
https://github.com/user-attachments/assets/d5600d03-e0ed-417c-b814-67a3a7bbebae
## Installation
- Use the package manager pip to install the necessary libraries to run the application.
- NOTE: If there are issues, make sure to add the line `pywin32==306` to the `requirements.txt` file.
```bash
pip install -r requirements.txt
```
- Ensure the app is located in your working directory, then start it by running the following command:
```python
streamlit run Home.py
```
## Features Summary
- 📄 **Team Overview:** Discover the top players on each team who led in various statistical categories.
- 🔍 **Find-a-Player:** Search for players based on specific statistical criteria, such as those averaging between 10-15 PPG and 5-10 RPG.
- ⚔️ **Head-to-Head Comparison:** Compare the stats of two players side by side with ease.
- 📊 **Stat Scatter:** Visualize two stats on a scatter plot for deeper insights. For example, identify the most efficient PBA scorers by plotting Points Per Game (PPG) against Field Goal Percentage (FG%).
- 📈 **League Leaders:** Explore which players led the league in key statistical categories.
## Future Developments
- 📅 **Include Stats from Other Seasons**: Expand the app to include statistics from previous PBA seasons, allowing users to explore and compare player performance over time.
- 🛠️ **Customizable Visualizations**: Allow users to create and save custom visualizations, making it easier to track specific stats or trends over the course of a season.
- 📊 **Team Analytics**: Expand the analysis to team-level statistics, offering insights into team performance, strengths, and weaknesses throughout the season.
## Methodology
- 🌐 **Web Scraping**:
- Libraries used: `pandas`,`bs4`,`requests`
- Data was extracted from *basketball.realgm.com* using Python's web scraping libraries. This allowed us to gather detailed statistics for PBA players participating in the 2023-2024 Philippine Cup.
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
def table_scraper(url, class_name):
page = requests.get(url=url).text
soup = BeautifulSoup(page, 'html.parser')
header_cols = soup.find(class_=class_name).find('thead').find_all('th')
header_cols_cleaned = [header_col.text.strip() for header_col in header_cols]
row_data = soup.find(class_=class_name).find('tbody').find_all('tr')
df = pd.DataFrame(columns=header_cols_cleaned)
for index, row in enumerate(row_data):
content = [data.text for data in row.find_all('td')]
length = len(df)
df.loc[length] = content
for row in df.columns:
try:
# Attempt to convert the column to float type
df[row] = df[row].astype('float')
print(f"Successfully converted {row} to float.")
except ValueError:
# Skip conversion if an error occurs
print(f"Could not convert {row} to float, skipping.")
pass
return df
```
The above Python code snippet, sourced from `notebooks/Basketball_realgm_PBA_webscraper.ipynb` within this repository, extracts a DataFrame from basketball.realgm.com. It fetches the page content, parses it to locate table headers and rows, and attempts to convert the data into a float format where possible.
- ⚙️ **Filtering**:
- Libraries used: `pandas`, `streamlit`
- The pandas library was utilized to load the web-scraped dataframes and apply filters based on user input. The filtering interface was constructed using Streamlit widgets, with session state logic connecting widget selections to pandas operations
- This code snippet from the Stat Scatter page illustrates how to implement the page logic using Streamlit's `session state` and `pandas`
```python
import streamlit as st
import pandas as pd
import altair as alt
df = pd.read_csv(r'datasets\compiled_stats.csv')
col1, col2 = st.columns(2)
with col1:
st.selectbox('Y-axis',
options=relevant_filters,
on_change=store_value,
key='_y_axis')
with col2:
st.selectbox('X-axis',
options=relevant_filters,
on_change=store_value,
key='_x_axis')
filtered_df = df[ (df['GP'] >= st.session_state._games_played)
]
chart = alt.Chart(filtered_df).mark_circle().encode(
alt.X(st.session_state._x_axis).scale(zero=False),
alt.Y(st.session_state._y_axis).scale(zero=False, padding=1),
color='Team',
size=st.session_state._y_axis,
description='Player'
)
st.altair_chart(chart)
```