https://github.com/syedabareehaali/github-repo-metadata-analytics
Jupyter Notebook analyzing GitHub repository metadata using Python, Parquet, Pandas, and DuckDB
https://github.com/syedabareehaali/github-repo-metadata-analytics
analytics github hacktoberfest hacktoberfest-accepted hacktoberfest2025 pandas-python parquet-generator python
Last synced: 5 months ago
JSON representation
Jupyter Notebook analyzing GitHub repository metadata using Python, Parquet, Pandas, and DuckDB
- Host: GitHub
- URL: https://github.com/syedabareehaali/github-repo-metadata-analytics
- Owner: syedabareehaali
- Created: 2025-10-30T18:55:43.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-10-30T19:22:31.000Z (5 months ago)
- Last Synced: 2025-10-30T21:08:23.649Z (5 months ago)
- Topics: analytics, github, hacktoberfest, hacktoberfest-accepted, hacktoberfest2025, pandas-python, parquet-generator, python
- Language: Jupyter Notebook
- Homepage:
- Size: 353 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GitHub Repo Metadata Analytics
This project performs data analysis on GitHub repository metadata using Python and libraries such as **pandas**, **matplotlib**, and **DuckDB**.
## ๐ Overview
The goal of this analysis is to understand trends and patterns across open-source repositories, including:
- Distribution of stars, forks, and issues
- Popular programming languages
- Relationship between activity indicators (commits, pull requests) and repository popularity
## ๐งช Hypothesis Testing
- **Hโ (Null Hypothesis):** Repository popularity (stars) is independent of activity indicators (commits, forks, pull requests).
- **Hโ (Alternative Hypothesis):** Repository popularity is influenced by these activity indicators.
Statistical tests such as correlation and scatter plots were used to test this relationship.
## ๐ ๏ธ Tools and Libraries
- **Python 3.11**
- **pandas**, **matplotlib**, **seaborn**, **scipy**
- **DuckDB** for querying Parquet data
- **Jupyter Notebook** for analysis and visualization
## ๐ How to Run
1. Clone this repository:
```bash
git clone https://github.com/syedabareehaali/GitHub-Repo-Metadata-Analytics.git
Open the notebook in Jupyter:
2. Open the notebook in Jupyter: jupyter notebook
3. Run all cells in order.
## ๐ท๏ธ Hacktoberfest 2025
This repository is part of Hacktoberfest 2025.
Contributions are welcome โ you can improve documentation, enhance analysis, or suggest visualizations!
## ๐ฉโ๐ป Author
Syeda Bareeha Ali
MERN Stack Developer & Data Enthusiast
๐ง sbareeha19@mail.com
๐ https://github.com/syedabareehaali