Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danthe1st/pat-bug-assignment
https://github.com/danthe1st/pat-bug-assignment
Last synced: 13 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/danthe1st/pat-bug-assignment
- Owner: danthe1st
- License: mit
- Created: 2024-04-30T09:12:55.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-06-03T11:55:06.000Z (6 months ago)
- Last Synced: 2024-06-03T13:55:51.053Z (6 months ago)
- Language: Python
- Size: 37.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# automated Bug classification using Machine Learning methods
This repository contains code for determining the assignee of issues in Open Source projects.
## Retrieval
The issues are downloaded from and as well as the [Eclipse Bugzilla](https://bugs.eclipse.org) using the corresponding APIs. The code for retrieving issues from GitHub is located in `download_issues_github.py` while the respective code for Bugzilla is located in `download_issues_bugzilla.py`.The GitHub API requres an API token to be present in a `.token` file due to the amount of necessary API calls while reading issues from the Bugzilla API does not require authentication.
The endpoint in the Bugzilla API does not include the full issue description/first comment making it necessary to perform an API request for every issue. This is done when the variable `INCLUDE_BODY` is set to `True` in [`download_issues_bugzilla.py`](download_issues_bugzilla.py).
Note that doing so sends a significant amount of API requests (one for every issue as opposed to one in total) hence it is recommended to reduce `ISSUE_COUNT` (making 100000 API requests may take a while) when requesting bodies as well.## Preprocessing
Issues are preprocessed in a script `preprocess_issues.py`.
It is possibly to supply a command-line-argument containing the name of the file containing issues downloaded by one of the aforementioned retrieval scripts.The variable `TOP_K_ASSIGNEES` can be set in order to only consider the assignees with the most issues assigned to them.
## Classification
It is possible to train and evaluate the classifier by running `classifier.py`.