{"id":15560996,"url":"https://github.com/tubaf-ifi-dipit/github2pandas","last_synced_at":"2025-04-23T21:48:22.615Z","repository":{"id":42067787,"uuid":"342246785","full_name":"TUBAF-IFI-DiPiT/github2pandas","owner":"TUBAF-IFI-DiPiT","description":"Aggregation of GitHub activities and transformation in Pandas Dataframes","archived":false,"fork":false,"pushed_at":"2022-08-04T07:11:17.000Z","size":7714,"stargazers_count":7,"open_issues_count":1,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-23T21:48:15.067Z","etag":null,"topics":["git-miner","git-mining-tool","github","learning-analytics","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TUBAF-IFI-DiPiT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-25T13:04:06.000Z","updated_at":"2025-02-22T20:18:44.000Z","dependencies_parsed_at":"2022-08-12T04:00:27.368Z","dependency_job_id":null,"html_url":"https://github.com/TUBAF-IFI-DiPiT/github2pandas","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUBAF-IFI-DiPiT%2Fgithub2pandas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUBAF-IFI-DiPiT%2Fgithub2pandas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUBAF-IFI-DiPiT%2Fgithub2pandas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUBAF-IFI-DiPiT%2Fgithub2pandas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TUBAF-IFI-DiPiT","download_url":"https://codeload.github.com/TUBAF-IFI-DiPiT/github2pandas/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250522297,"owners_count":21444510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["git-miner","git-mining-tool","github","learning-analytics","python"],"created_at":"2024-10-02T16:04:45.922Z","updated_at":"2025-04-23T21:48:22.585Z","avatar_url":"https://github.com/TUBAF-IFI-DiPiT.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Transform GitHub Activities to Pandas Dataframes\n\n## General information\n\nThis package is being developed by the participating partners (TU Bergakademie Freiberg, OVGU Magdeburg and HU Berlin) as part of the DiP-iT project [Website](http://dip-it.ovgu.de/).\n\nThe package implements Python functions for \n+ aggregating and preprocessing GitHub activities (Commits, Actions, Issues, Pull-Requests) and \n+ generating project progress summaries according to different metrics (ratio of changed lines, ratio of aggregated Levenshtein distances e.g.).\n\n`github2pandas` stores the collected information in a collection of pandas DataFrames starting from a user defined root folder. The structure beyond that (file names, folder names) is defined as a member variable in the corresponding classes and can be overwritten. The default configuration results in the following file structure.\n\n```\n|-- My_Github_Repository_0               \u003c- Repository name\n|   |- Repo.json                         \u003c- Json file containing user and repo name\n|   |- Repository\n|   |   |- Repository.p  \n|   |- Issues\n|   |   |- pdIssuesComments.p\n|   |   |- pdIssuesEvents.p\n|   |   |- pdIssues.p\n|   |   |- pdIssuesReactions.p\n|   |- PullRequests\n|   |   |- pdPullRequestsComments.p\n|   |   |- pdPullRequestsCommits.p\n|   |   |- pdPullRequestsEvents.p\n|   |   |- pdPullRequests.p\n|   |   |- pdPullRequestsReactions.p\n|   |   |- pdPullRequestsReviews.p\n|   |- Users.p\n|   |- Versions\n|   |   |- pdCommits.p\n|   |   |- pdEdits.p\n|   |   |- pdBranches.p\n|   |   |- pVersions.db\n|   |   |- repo                         \u003c- Repository clone\n|   |   |   |- ..\n|   |- Workflows\n|       |- pdWorkflows.p\n|-- My_Github_Repository_1\n...\n```\nThe internal structure and relations of the data frames are included in the project's [wiki](https://github.com/TUBAF-IFI-DiPiT/github2pandas/wiki).\n\n## Installation\n\n`github2pandas` is available on [pypi](https://pypi.org/project/github2pandas/). Use pip to install the package.\n\n### global\n\nOn Linux:\n\n```\nsudo pip3 install github2pandas \nsudo pip install github2pandas\n```\n\nOn Windows as admin or for one user:\n\n```\npip install github2pandas\npip install --user github2pandas \n```\n\n### in virtual environment:\n\n```\npipenv install github2pandas\n```\n\n## Usage  \n\nGitHub token is required for use, which is used for authentication. The [website](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) describes how you can generate this for your GitHub account. Customise the username and project name and explore any public or private repository you have access to with your account!\n\nAccess token is to define in `.env` oder `.py (.ipynb)` file. The default value of python.envFile setting is `${workspaceFolder}/.env`\n\n```\nTOKEN=\"example_token\"\n```\n\nAn short example of a python script:\n\n```\nimport os\nfrom pathlib import Path\n# github2pandas imports\nfrom github2pandas.core import Core\nfrom github2pandas.github2pandas import GitHub2Pandas\n\ngit_repo_name = \"github2pandas\"\ngit_repo_owner = \"TUBAF-IFI-DiPiT\"\n    \ndata_root_dir = Path(\"data\")\ndata_root_dir.mkdir(parents=True, exist_ok=True)\ngithub_token = os.environ['TOKEN']\n\ngithub2pandas = GitHub2Pandas(github_token,data_root_dir)\nrepo = github2pandas.get_repo(git_repo_owner, git_repo_name)\n# extract complete repository\ngithub2pandas.generate_pandas_tables(repo)\n\n# exports pandas files to one excel file\nGitHub2Pandas.save_tables_to_excel(Path(data_root_dir,git_repo_owner,git_repo_name))\n```\n\n## Notebook examples\n\nCurrently not updated for github2pandas version 2.0.0!!\nThe corresponding [github2pandas_notebooks](https://github.com/TUBAF-IFI-DiPiT/github2pandas_notebooks/blob/main/README.md) repository illustrates the usage with examplary investigations.\n\nThe documentation of the module is available at [https://github2pandas.readthedocs.io/](https://github2pandas.readthedocs.io/).\n\n## Working with pipenv\n\n| Process                                     | Command                                                 |\n| ------------------------------------------- | ------------------------------------------------------- |\n| Installation                                | `pipenv install --dev`                                  |\n| Run specific script                         | `pipenv run python file.py`                             |\n| Run all Tests                               | `pipenv run python -m unittest`                         |\n| Run all tests in a specific folder          | `pipenv run python -m unittest discover -s 'tests'`     |\n| Run all tests with specific filename        | `pipenv run python -m unittest discover -p 'test_*.py'` |\n| Start Jupyter server in virtual environment | `pipenv run jupyter notebook`                           | \n\n# For Contributors\n\nNaming conventions: https://namingconvention.org/python/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftubaf-ifi-dipit%2Fgithub2pandas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftubaf-ifi-dipit%2Fgithub2pandas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftubaf-ifi-dipit%2Fgithub2pandas/lists"}