Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sophilabs/pullreq-ml
A machine learning experiment for predicting Pull Requests acceptance rate
https://github.com/sophilabs/pullreq-ml
github-api machine-learning python sklearn
Last synced: 12 days ago
JSON representation
A machine learning experiment for predicting Pull Requests acceptance rate
- Host: GitHub
- URL: https://github.com/sophilabs/pullreq-ml
- Owner: sophilabs
- License: mit
- Created: 2018-01-05T20:15:31.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-12-23T13:30:19.000Z (about 3 years ago)
- Last Synced: 2024-04-17T05:46:15.113Z (8 months ago)
- Topics: github-api, machine-learning, python, sklearn
- Language: JavaScript
- Homepage: https://sophilabs.co/blog/pr-prediction-machine-learning
- Size: 34.2 KB
- Stars: 5
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Github PR prediction (pullreq-ml)
![ETL process](https://d2wlcd8my7k9h4.cloudfront.net/media/images/575815bd-fb2d-4886-9c0f-90d1b07a9683.png)
This Node/Python library builds a model to predict if a particular Pull Request (PR) will be accepted when it is created, by learning information about a Github Project. The aim of this library is to aid Project integrator in managing PRs for a particular project. You can find more information about the model and how in this [article](https://sophilabs.co/blog/pr-prediction-machine-learning).
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
### Prerequisites
What things you need to install the software and how to install them
You will need the following:
* [Python 3.6](https://www.python.org/downloads/) or newer
* [Node 8](https://nodejs.org/en/download/) or newer
* [MongoDB](https://www.mongodb.com/download-center) 3.2 or newer
* [Git](https://git-scm.com/downloads)
* A Github Access Token for using the Github API. This [post](https://github.com/blog/1509-personal-api-tokens) explains how to get yours.### Installing & Running
0. Choose a project to predict. In this document I will use https://github.com/Netflix/pygenie, because it is smaller, but you can use any, like the [Node](https://github.com/nodejs/node/) project
1. Clone this repository into your machine:```bash
git clone https://github.com/sophilabs/pullreq-ml.git
```
2. (Optional) Install your local copy into a virtual environment. For example using the [venv](https://docs.python.org/3/library/venv.html) library you can do the following.
```bash
python -m venv venv
source venv/bin/activate
```
3. Install dependencies
```bash
cd pullreq-ml # or pullreq-ml-master
npm install
pip install -r requirements.txt
```
4. (Optional) Create a user for your MongoDB instance
```bash
echo "db.createUser({ user: 'github', 'pwd': 'github', roles: ['readWrite'] })" | mongo github
```
5. Replace the contents of [`config.js`](config.js) with the actual repo and database authentication. For example
```javascript
module.exports = {
// Local Mongo DB
MONGO_DB_URL: 'mongodb://github:github@localhost:27017/github',
// Token
GITHUB_ACCESS_TOKEN: '',
// Repo Information for example for https://github.com/Netflix/pygenie you should put
REPO_OWNER: 'Netflix',
REPO_NAME: 'pygenie'
}
```
5. Clone the target repo inside the `targetrepo` folder
```bash
git clone https://github.com/Netflix/pygenie.git targetrepo
```
6. Start fetching Repo information
```bash
node fetch.js
```
7. Train and evaluate Pull Request Acceptance for your repository
```bash
python evaluate.py
```
You should see an output like the following one
```
Report on Test data
precision recall f1-score supportnot merged 0.76 0.22 0.34 264
merged 0.78 0.98 0.87 753avg / total 0.78 0.78 0.73 1017
Dumped classifier data to classifier.pkl
```
This command generates a `classifier.pkl` binary file which can be used to predict any PR on the target Project.## TODO
* Build a file to predict a particular PR against the trained model. A command like:
```bash
> python classify.py https://github.com/nodejs/node/pull/11107
Will not be merged!
```## Built With
* [scikit-learn](http://scikit-learn.org/) - Used their algorithms to estimate PR merge predictions.
* [MongoDB](https://api.mongodb.com/python/current/) - Used to store Github downloaded project data.
* [Git](https://git-scm.com/) - Used to compute diffs and analyze PR commit deltas.## Contributing
Feel free to make a Pull Request if you find a bug or want to implement a feature. We welcome any help.
## Authors
* **Ignacio Avas** - *Initial work* - [igui](https://github.com/igui)
## Acknowledgments
* Pablo Grill for his insight and knowledge over Machine Learning
## License
pullreq-ml is Copyright (c) 2018 sophilabs, inc. It is free software, and may be
redistributed under the terms specified in the [license](LICENSE) file.## About
[![sophilabs][sophilabs-image]][sophilabs-url]
pullreq-ml is maintained and funded by sophilabs, inc. The names and logos for
sophilabs are trademarks of sophilabs, inc.[sophilabs-image]: https://s3.amazonaws.com/sophilabs-assets/logo/logo_300x66.gif
[sophilabs-url]: https://sophilabs.co