An open API service indexing awesome lists of open source software.

https://github.com/syedabareehaali/github-repo-metadata-analytics

Jupyter Notebook analyzing GitHub repository metadata using Python, Parquet, Pandas, and DuckDB
https://github.com/syedabareehaali/github-repo-metadata-analytics

analytics github hacktoberfest hacktoberfest-accepted hacktoberfest2025 pandas-python parquet-generator python

Last synced: 5 months ago
JSON representation

Jupyter Notebook analyzing GitHub repository metadata using Python, Parquet, Pandas, and DuckDB

Awesome Lists containing this project

README

          

# GitHub Repo Metadata Analytics

This project performs data analysis on GitHub repository metadata using Python and libraries such as **pandas**, **matplotlib**, and **DuckDB**.

## ๐Ÿ“Š Overview

The goal of this analysis is to understand trends and patterns across open-source repositories, including:
- Distribution of stars, forks, and issues
- Popular programming languages
- Relationship between activity indicators (commits, pull requests) and repository popularity

## ๐Ÿงช Hypothesis Testing

- **Hโ‚€ (Null Hypothesis):** Repository popularity (stars) is independent of activity indicators (commits, forks, pull requests).
- **Hโ‚ (Alternative Hypothesis):** Repository popularity is influenced by these activity indicators.

Statistical tests such as correlation and scatter plots were used to test this relationship.

## ๐Ÿ› ๏ธ Tools and Libraries

- **Python 3.11**
- **pandas**, **matplotlib**, **seaborn**, **scipy**
- **DuckDB** for querying Parquet data
- **Jupyter Notebook** for analysis and visualization

## ๐Ÿš€ How to Run

1. Clone this repository:
```bash
git clone https://github.com/syedabareehaali/GitHub-Repo-Metadata-Analytics.git
Open the notebook in Jupyter:

2. Open the notebook in Jupyter: jupyter notebook

3. Run all cells in order.

## ๐Ÿท๏ธ Hacktoberfest 2025

This repository is part of Hacktoberfest 2025.
Contributions are welcome โ€” you can improve documentation, enhance analysis, or suggest visualizations!

## ๐Ÿ‘ฉโ€๐Ÿ’ป Author

Syeda Bareeha Ali
MERN Stack Developer & Data Enthusiast
๐Ÿ“ง sbareeha19@mail.com
๐ŸŒ https://github.com/syedabareehaali