An open API service indexing awesome lists of open source software.

https://github.com/netcodez/scala-github-history-analysis

This data analysis project explores the real-world project repository data of the Scala programming language, providing insights into its development, contributors, and trends.
https://github.com/netcodez/scala-github-history-analysis

Last synced: 12 months ago
JSON representation

This data analysis project explores the real-world project repository data of the Scala programming language, providing insights into its development, contributors, and trends.

Awesome Lists containing this project

README

          

# GitHub History Analysis of the Scala Language
This data analysis project explores the real-world project repository data of the Scala programming language, providing insights into its development, contributors, and trends.

## Overview
This repository hosts a comprehensive data analysis project that delves into the GitHub history of the Scala programming language. Scala, a mature and versatile programming language, has garnered significant traction, particularly within the data science community.

## Project Objectives
The primary objective of this project is to conduct a thorough examination of Scala's development history by leveraging the project's repository data obtained from GitHub. The analysis revolves around a dataset comprising three key files:

- pulls_2011-2013.csv: Encompasses fundamental information about pull requests submitted from the end of 2011 through the end of 2013.
- pulls_2014-2018.csv: Encompasses fundamental information about pull requests submitted from 2014 to 2018.
- pull_files.csv: Contains details of the files modified by each pull request.

Through meticulous analysis, we aim to address the following pivotal questions:

- Project Maintenance: Does the Scala project exhibit sustained activity and maintenance? By evaluating the monthly trend of submitted pull requests, we endeavor to ascertain the project's level of ongoing development.
- Community Dynamics: Is there a strong sense of camaraderie within the Scala project? By meticulously analyzing the distribution of pull requests submitted by individual users, we seek to discern the dynamics and cohesion within the community.
- Recent File Modifications: Which files have undergone the most recent changes within the last ten pull requests? Identifying the actively modified areas enables us to gain insights into the current focus of development.
- Leading Contributors to Specific Files: Who has contributed the most pull requests to a given file? Through this analysis, we aim to identify key developers who have made significant contributions in specific areas of expertise.
- Recent Contributors to Specific Files: Who are the top contributors to a specific file based on the last ten pull requests? Identifying active contributors offers valuable insights into the individuals driving recent development in specific areas.
- Analysis of Specific Developers: We will scrutinize the annual pull request contributions of two notable developers to understand their individual contribution trends.
- Developer Contributions Visualization: We will assess the number of pull requests submitted by the aforementioned developers for a specific file, allowing us to gauge their experience and recent activity.

## Getting Started
To reproduce the analysis, follow these steps:

- Clone this repository to your local machine.
- Ensure that all necessary dependencies (e.g., Python, pandas) are installed.
- Launch the Jupyter Notebook file, GitHub History of the notebook, within a suitable Jupyter Notebook environment.
- Execute the notebook's cells sequentially to perform the analysis and generate the corresponding results.