{"id":17094920,"url":"https://github.com/ezimuel/python-graph-analysis-project","last_synced_at":"2026-04-11T12:35:27.229Z","repository":{"id":206091389,"uuid":"714011166","full_name":"ezimuel/python-graph-analysis-project","owner":"ezimuel","description":"A graph analysis of collaborations in Python open source community using GitHub","archived":false,"fork":false,"pushed_at":"2023-11-07T22:16:28.000Z","size":8045,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-27T17:48:32.527Z","etag":null,"topics":["github","graph","graphql","network-analysis","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ezimuel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-03T18:06:23.000Z","updated_at":"2023-11-07T22:19:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"42f5375f-55f6-43b0-89a8-324e9663f37e","html_url":"https://github.com/ezimuel/python-graph-analysis-project","commit_stats":null,"previous_names":["ezimuel/python-graph-analysis-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ezimuel/python-graph-analysis-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ezimuel%2Fpython-graph-analysis-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ezimuel%2Fpython-graph-analysis-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ezimuel%2Fpython-graph-analysis-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ezimuel%2Fpython-graph-analysis-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ezimuel","download_url":"https://codeload.github.com/ezimuel/python-graph-analysis-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ezimuel%2Fpython-graph-analysis-project/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270325054,"owners_count":24564985,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-13T02:00:09.904Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["github","graph","graphql","network-analysis","python"],"created_at":"2024-10-14T14:25:06.368Z","updated_at":"2026-04-11T12:35:22.204Z","avatar_url":"https://github.com/ezimuel.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A graph analysis of collaborations in Python open source community using GitHub\n\nThis is the project done by [Enrico Zimuel](https://github.com/ezimuel/) for the course Graph Algorithms\nof the Prof. [Stefano Guarino](https://www.iac.cnr.it/personale/stefano-guarino) in the [Master Data Analytics](https://master-data-analytics.it/) postgraduated course at the University of Roma Tre (Italy).\n\nThe goal is to study the collaborations between Python open source projects using the\ndata stored in github.com.\n\n## Configure the Github API token\n\nWe used the Github REST API and GraphQL API for retrieving all the information.\nThat means you need to use an API token provided by Github. You can create a new access token\nfrom [this page](https://github.com/settings/tokens).\n\nWhen you have the access token, you need to store it in the `GITHUB_ACCESS_TOKEN`\nenv variable, as follows:\n\n```bash\nexport GITHUB_TOKEN=\"insert here the access token\"\n```\n\n## Virtual environment and dependencies\n\nThe first time you need to install the dependencies, we suggest to use a [virtual environment](https://docs.python.org/3/library/venv.html)\nusing the following commands:\n\n```bash\npython3 -m venv env\n```\n\nThis will install a virtual environment, you need to do this only the first time.\nThen, you can activate the virtual env as follows:\n\n```bash\nsource env/bin/activate\n```\n\nThen you can install the dependencies using `pip`, as follows:\n\n```bash\npip install -r requirements.txt\n```\n\nYou can deactivate the virtual enviroment with following command:\n\n```bash\ndeactivate\n```\n\n## Extract the repository from Github\n\nWe extracted the top 1000 Python repositories with more than 5k stars using GraphQL.\n\nTo extract these repository you need to execute the `repositories.py` script, as follows:\n\n```bash\npython repositories.py\n```\n\nThis will create a [data/repositories.json](data/repositories.json) file.\n\n## Extract the contributors for each repository\n\nFor each repository we need to extract the top 300 contributors.\nThis is done using the `contributors.py` script, running the following command:\n\n```bash\npython contributors.py\n```\n\nThis command will create 1000 csv files in the `data`` folder with the top 300\ncontributors for each repository.\n\n## Create the graph model\n\nFinally, we have all the informations for creating the graph model.\n\nIn our model a node is a repository and an edge is the connection between repositories.\nTwo repository are connected with a weight edge if there is at least one common contributor.\nThe weight of the edge is the number of common contributors.\n\nWe used [igraph](data/repositories.json) for creating this model, reading all the repositories\nand providing the intersection of the contributors.\n\nYou can create the graph model using the following command:\n\n```bash\npython build_graph.py\n```\n\nThe model is generated and stored in the file [network.graphml](network.graphml) using the\n[GraphML](http://graphml.graphdrawing.org/) format.\n\nThis file contains a graph with 1,000 nodes and 104,148 edges.\n\nBelow is reported a visualization of the graph model done with [Gephi](https://gephi.org/) software.\n\n![Visualization of the graph model](graph_gephi.png)\n\n\n## The analysis of the model\n\nThe graph model has been analyzed using the jupyter file [analysis.ipynb].\n\n## Copyright\n\nThe author of this software is [Enrico Zimuel](https://github.com/ezimuel/).\n\nThis software is released under the [MIT](/LICENSE) license.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fezimuel%2Fpython-graph-analysis-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fezimuel%2Fpython-graph-analysis-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fezimuel%2Fpython-graph-analysis-project/lists"}