https://github.com/ezimuel/python-graph-analysis-project
A graph analysis of collaborations in Python open source community using GitHub
https://github.com/ezimuel/python-graph-analysis-project
github graph graphql network-analysis python
Last synced: 3 months ago
JSON representation
A graph analysis of collaborations in Python open source community using GitHub
- Host: GitHub
- URL: https://github.com/ezimuel/python-graph-analysis-project
- Owner: ezimuel
- License: mit
- Created: 2023-11-03T18:06:23.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-07T22:16:28.000Z (over 2 years ago)
- Last Synced: 2025-05-27T17:48:32.527Z (about 1 year ago)
- Topics: github, graph, graphql, network-analysis, python
- Language: Jupyter Notebook
- Homepage:
- Size: 7.67 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A graph analysis of collaborations in Python open source community using GitHub
This is the project done by [Enrico Zimuel](https://github.com/ezimuel/) for the course Graph Algorithms
of the Prof. [Stefano Guarino](https://www.iac.cnr.it/personale/stefano-guarino) in the [Master Data Analytics](https://master-data-analytics.it/) postgraduated course at the University of Roma Tre (Italy).
The goal is to study the collaborations between Python open source projects using the
data stored in github.com.
## Configure the Github API token
We used the Github REST API and GraphQL API for retrieving all the information.
That means you need to use an API token provided by Github. You can create a new access token
from [this page](https://github.com/settings/tokens).
When you have the access token, you need to store it in the `GITHUB_ACCESS_TOKEN`
env variable, as follows:
```bash
export GITHUB_TOKEN="insert here the access token"
```
## Virtual environment and dependencies
The first time you need to install the dependencies, we suggest to use a [virtual environment](https://docs.python.org/3/library/venv.html)
using the following commands:
```bash
python3 -m venv env
```
This will install a virtual environment, you need to do this only the first time.
Then, you can activate the virtual env as follows:
```bash
source env/bin/activate
```
Then you can install the dependencies using `pip`, as follows:
```bash
pip install -r requirements.txt
```
You can deactivate the virtual enviroment with following command:
```bash
deactivate
```
## Extract the repository from Github
We extracted the top 1000 Python repositories with more than 5k stars using GraphQL.
To extract these repository you need to execute the `repositories.py` script, as follows:
```bash
python repositories.py
```
This will create a [data/repositories.json](data/repositories.json) file.
## Extract the contributors for each repository
For each repository we need to extract the top 300 contributors.
This is done using the `contributors.py` script, running the following command:
```bash
python contributors.py
```
This command will create 1000 csv files in the `data`` folder with the top 300
contributors for each repository.
## Create the graph model
Finally, we have all the informations for creating the graph model.
In our model a node is a repository and an edge is the connection between repositories.
Two repository are connected with a weight edge if there is at least one common contributor.
The weight of the edge is the number of common contributors.
We used [igraph](data/repositories.json) for creating this model, reading all the repositories
and providing the intersection of the contributors.
You can create the graph model using the following command:
```bash
python build_graph.py
```
The model is generated and stored in the file [network.graphml](network.graphml) using the
[GraphML](http://graphml.graphdrawing.org/) format.
This file contains a graph with 1,000 nodes and 104,148 edges.
Below is reported a visualization of the graph model done with [Gephi](https://gephi.org/) software.

## The analysis of the model
The graph model has been analyzed using the jupyter file [analysis.ipynb].
## Copyright
The author of this software is [Enrico Zimuel](https://github.com/ezimuel/).
This software is released under the [MIT](/LICENSE) license.