{"id":19681726,"url":"https://github.com/collab-uniba/developersinactivityanalysis","last_synced_at":"2025-04-29T04:32:13.381Z","repository":{"id":146138842,"uuid":"183011533","full_name":"collab-uniba/developersInactivityAnalysis","owner":"collab-uniba","description":"A collection of scripts to collect data from GitHub and analyze developers' breaks during their lifetime in a project and determine which of these breaks can be considered Sleepings, Hibernations or Deads.","archived":false,"fork":false,"pushed_at":"2024-10-15T22:42:38.000Z","size":140062,"stargazers_count":1,"open_issues_count":1,"forks_count":3,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-05T13:11:15.569Z","etag":null,"topics":["abandonment","github","msr","oss","retention"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/collab-uniba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-23T12:35:52.000Z","updated_at":"2024-10-15T22:37:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"c9d57816-502d-4a98-8f82-c3eb875e6a91","html_url":"https://github.com/collab-uniba/developersInactivityAnalysis","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collab-uniba%2FdevelopersInactivityAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collab-uniba%2FdevelopersInactivityAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collab-uniba%2FdevelopersInactivityAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/collab-uniba%2FdevelopersInactivityAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/collab-uniba","download_url":"https://codeload.github.com/collab-uniba/developersInactivityAnalysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251433008,"owners_count":21588692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abandonment","github","msr","oss","retention"],"created_at":"2024-11-11T18:08:37.292Z","updated_at":"2025-04-29T04:32:12.180Z","avatar_url":"https://github.com/collab-uniba.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Will you come back to contribute? Investigating the inactivity of OSS developers in GitHub\n[![DOI](https://zenodo.org/badge/183011533.svg)](https://zenodo.org/badge/latestdoi/183011533)\n\n### Setup\n\nUse the `productivity` branch for the latest updates.\n\nAdd to the root a folder named `Resources/` with the following files: \n- `repositories.txt` containing the list of projects (one per line) to be analyzed, in the following format `org/repo_name` (e.g., `atom/atom); \n- `tokens.txt` (optional) containing the list of GH tokens to be used;\n\n### Sampling of developers\n\n#### Core Developers Selection\n\nRefer to this [README.md](CoreSelection/README.md) file.\n\n#### Truck-Factor Developer Selection\n\nRefer to this [README.md](TruckFactor/README.md) file.\n\n---\n\n### CommitExtractor.py\n\n\n#### Params\n\nUses the tokens defined in `Resources/tokens.txt` and the list of repository urls in `Resources/repositories.txt`, as defined in the `Settings.py` file.\n\n- None.\n\n#### Requirements\n\n- Set files and folders names in the `Settings.py` file\n\n#### Execution\n\n`python CommitExtractor.py`\n\n\n#### Output\n\n- `logs/Commit_Extraction_organization.log`: log file\n- `Organizations/\u003corganization\u003e/[\u003crepo1\u003e...\u003crepoN\u003e]/`: Results folders\n- For each repo folder: \n  - `commit_list.csv`: List of the commits in the format: \u003cSHA; author_id; date\u003e\n  - `commit_history_table.csv`: Matrix of autors and dates. The cells contain the number of the commits of a developer in one day\n  - `pauses_duration_list.csv`: List of pauses durations in days for each developer in the format: \u003cdev; listOfDurations\u003e\n  - `pauses_dates_list.csv`: List of pauses dates for each developer in the format: \u003cdev; listOfPauseDates\u003e\n- The same files are given after merging the commits of every organization's repo in the `Organizations/\u003corganization\u003e/` folder.\n\nif you came here from point 2 of core selection you can now perform step 3 following [(CoreSelection | Step 3)](CoreSelection/README.md#L18)\n\n---\n\n### ActivitiesExtractor.py\n\n#### Params\n\n- None\n\n#### Requirements\n\n- Set files and folders names in the `Settings.py` file\n\n#### Execution\n\n`python ActivitiesExtractor.py`\n\n\n#### Output\n\n- `logs/Commit_Extraction_organization.log`: log file\n- `Organizations/\u003corganization\u003e/[\u003crepo1\u003e...\u003crepoN\u003e]/Other_Activities/`: Results folders\n- For each repo folder:\n  - `issues_comments_repo.csv`: List of the issue comments in the format: \u003cid; date; creator_login\u003e\n  - `issues_events_repo.csv`: List of the issue events in the format: \u003cid; date; creator_login\u003e\n  - `issues_prs_repo.csv`: List of the issue and pull request creations in the format: \u003cid; date; creator_login\u003e\n  - `pulls_comments_repo.csv`: List of the pull request comments in the format: \u003cid; date; creator_login\u003e\n\n\n### PullRequestExtractor.py\n\n### NonMergedCommitsExtractor.py\n\n### MissingStuffCollector.py\n\n### CodingTableBuilder.py\n\n---\n\n### BreaksIdentification.py\n\n#### Params\n\n- `mode`: enter one of following modes ['tf', 'a80', 'a80mod', 'a80api']\n\n#### Requirements\n\n- Set files and folders names in the `Settings.py` file\n- Insert the list of the TF/core developers (\u003cTF_developers_file\u003e) in the right folder. Formatted as a list of \u003cname;login\u003e. The path to save the file is set in the `Settings.py` file.\n- Set the `window` size and the `shift` size in the `Settings.py` file\n\n#### Execution\n\n`python BreaksIdentification.py tf | a80 | a80mod | a80api`\n\n#### Output\n\n- `logs/Breaks_Identification.log`: log file\n- `Organizations/\u003corganization\u003e/Dev_Breaks/`: Results folders\n- For each developer in the TF file:\n  - `\u003cdevLogin\u003e_breaks.csv`: List of the breaks in the format: \u003clen; dates; Tfov_used\u003e\n\n#### Algorithm\n\nLet **D** be a developer to analyze and let **life(D)** be the number of days between its first and last commits.\nFor each sliding *window* **W** in **life(D)** which slides of *shift* days. The values of variables *window* (default 90 days) and *shift* (default 7 days) are set in the `Settings.py` file).\n\nThe goal is to select all the *breaks* (*pauses* that are larger than usual) associated with the *Tfov* (Far-out-value threshold) of the first window where they have been found:\n\n1. PAUSES SELECTION **STEP**\n\n- In the list `win_pauses`, put all the pauses within **W** (only these pauses define the rythm of **D** in **W**).\n- In the list `partially_included`, put all the pauses partially within **W** (i.e., pauses that start in **W** and end in the next window).\n\n2. *Tfov* DEFINITION **STEP**\n\n- If `win_pauses` contains \u003e=4 *pauses* then the **W** is valid, then use `win_pauses` to calculate *Tfov*. If *Tfov* is valid (i.e., *IQR*\u003e1), then proceed to the breaks identification step (go to STEP 3).\n- Else, when `win_pauses` \u003c 4 (i.e., *Tfov* cannot be calculated) or if *Tfov* is invalid (i.e., *IQR*\u003c=1) for **W**, then:\n  - If a previous *Tfov* exists, then consider it as the current *Tfov* and proceed to the next step for breaks identification (go to STEP 3).\n  - Otherwise, save into the list `clear_breaks` all the *pauses* from `partially_included` that are larger than the window size and have not been considered yet, ignore the other *pauses* in `win_pauses`; move forward **W** by *shift* days and RESTART (go back to STEP 1).\n\n  (Note: The *pauses* that are larger than *shift* days will be considered in the next **W** and so on, whereas the smaller ones are not breaks and can be safely ignored).\n\n3. BREAKS IDENTIFICATION **STEP**\n\n- Select as *break* each couple *\u003cp, t\u003e* from the lists `win_pauses` and `partially_included` where *t* is *Tfov* and *p* is a *pause* \u003e *Tfov*.\n  - Move forward **W** by *shift* days and RESTART (go back to STEP 1).\n\n4. FINAL **STEP** (When there are no more **W**)\n\n- Compute *Avg_Tfov* as the average of all the valid *Tfovs* found.\n- Save the *pauses* in the list `clear_breaks` as *breaks* (*\u003cp, t\u003e* where *t* is *Avg_Tfov*, and *p* is a *pause* \u003e *Avg_Tfov* as for list definition).\n\n---\n\n### BreaksLabeling.py\n\n#### Params\n\n- `mode`: choose one of following modes ['tf', 'a80', 'a80mod', 'a80api']\n\n#### Requirements\n\n- Make sure to have already executed the `BreaksIdentification.py` script to get the `\u003cdevLogin\u003e_breaks.csv` files (one for each developer).\n\n#### Execution\n\n`python BreaksLabeling.py tf | a80 | a80mod | a80api`\n\n#### Output\n\n- `logs/Breaks_Labeling.log`: events log file\n- `Organizations/\u003corganization\u003e/Dev_Breaks/`: Results folders\n- For each developer in the TF file:\n  - `\u003cdevLogin\u003e_labeled_breaks.csv`: List of the breaks in the format: \u003clen; dates; Tfov_used; label; previously\u003e\n\n#### Algorithm\n\n1. Get a *break* from the `Breaks` list.\n\n2. If there is not any other activity performed by the developer during the break, then label it `INACTIVE` if \u003c 365 days; `GONE` otherwise.\n\n3. If there are other activities in the period:\n\n- Define `sub_breaks_list` as the list of the intervals between such activities (*sub_break*).\n- Identify each *sub_break* \u003e *Tfov* from the `sub_breaks_list` and label it based on the defined state diagram (∆t_inactive = ∆t_non-coding = Tfov).\n\n![state diagram](https://dl.dropboxusercontent.com/s/4jluvxonjv1mz9d/New_state_diagram.png?dl=1)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollab-uniba%2Fdevelopersinactivityanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcollab-uniba%2Fdevelopersinactivityanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcollab-uniba%2Fdevelopersinactivityanalysis/lists"}