{"id":23041293,"url":"https://github.com/sophilabs/pullreq-ml","last_synced_at":"2025-08-14T21:31:36.437Z","repository":{"id":28195516,"uuid":"116423093","full_name":"sophilabs/pullreq-ml","owner":"sophilabs","description":"A machine learning experiment for predicting Pull Requests acceptance rate","archived":false,"fork":false,"pushed_at":"2021-12-23T13:30:19.000Z","size":35,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-04-17T05:46:15.113Z","etag":null,"topics":["github-api","machine-learning","python","sklearn"],"latest_commit_sha":null,"homepage":"https://sophilabs.co/blog/pr-prediction-machine-learning","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sophilabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-05T20:15:31.000Z","updated_at":"2023-12-05T15:50:53.000Z","dependencies_parsed_at":"2022-07-16T17:46:52.229Z","dependency_job_id":null,"html_url":"https://github.com/sophilabs/pullreq-ml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sophilabs%2Fpullreq-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sophilabs%2Fpullreq-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sophilabs%2Fpullreq-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sophilabs%2Fpullreq-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sophilabs","download_url":"https://codeload.github.com/sophilabs/pullreq-ml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229865821,"owners_count":18136371,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["github-api","machine-learning","python","sklearn"],"created_at":"2024-12-15T19:32:18.053Z","updated_at":"2024-12-15T19:32:18.673Z","avatar_url":"https://github.com/sophilabs.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Github PR prediction (pullreq-ml)\n\n![ETL process](https://d2wlcd8my7k9h4.cloudfront.net/media/images/575815bd-fb2d-4886-9c0f-90d1b07a9683.png)\n\nThis Node/Python library builds a model to predict if a particular Pull Request (PR) will be accepted when it is created, by learning information about a Github Project. The aim of this library is to aid Project integrator in managing PRs for a particular project. You can find more information about the model and how in this [article](https://sophilabs.co/blog/pr-prediction-machine-learning).\n\n## Getting Started\n\nThese instructions will get you a copy of the project up and running on your local machine for development and testing purposes.\n\n### Prerequisites\n\nWhat things you need to install the software and how to install them\n\nYou will need the following:\n* [Python 3.6](https://www.python.org/downloads/) or newer\n* [Node 8](https://nodejs.org/en/download/) or newer\n* [MongoDB](https://www.mongodb.com/download-center) 3.2 or newer\n* [Git](https://git-scm.com/downloads)\n* A Github Access Token for using the Github API. This [post](https://github.com/blog/1509-personal-api-tokens) explains how to get yours.\n\n### Installing \u0026 Running\n\n0. Choose a project to predict. In this document I will use https://github.com/Netflix/pygenie, because it is smaller, but you can use any, like the [Node](https://github.com/nodejs/node/) project\n1. Clone this repository into your machine:\n\n    ```bash\n    git clone https://github.com/sophilabs/pullreq-ml.git\n    ```\n2. (Optional) Install your local copy into a virtual environment. For example using the [venv](https://docs.python.org/3/library/venv.html) library you can do the following.\n   ```bash\n   python -m venv venv\n   source venv/bin/activate\n   ```\n3. Install dependencies\n   ```bash\n   cd pullreq-ml # or pullreq-ml-master\n   npm install\n   pip install -r requirements.txt\n   ```\n4. (Optional) Create a user for your MongoDB instance\n   ```bash\n   echo \"db.createUser({ user: 'github', 'pwd': 'github', roles: ['readWrite'] })\" | mongo github\n   ```\n5. Replace the contents of [`config.js`](config.js) with the actual repo and database authentication. For example\n   ```javascript\n    module.exports = {\n        // Local Mongo DB\n        MONGO_DB_URL: 'mongodb://github:github@localhost:27017/github',\n        // Token\n        GITHUB_ACCESS_TOKEN: '\u003cyour token here\u003e',\n        // Repo Information for example for https://github.com/Netflix/pygenie you should put\n        REPO_OWNER: 'Netflix',\n        REPO_NAME: 'pygenie'\n    }\n   ```\n5. Clone the target repo inside the `targetrepo` folder\n   ```bash\n   git clone https://github.com/Netflix/pygenie.git targetrepo\n   ```\n6. Start fetching Repo information\n   ```bash\n   node fetch.js\n   ```\n7. Train and evaluate Pull Request Acceptance for your repository\n   ```bash\n   python evaluate.py\n   ```\n   You should see an output like the following one\n   ```\n   Report on Test data\n             precision    recall  f1-score   support\n\n    not merged       0.76      0.22      0.34       264\n        merged       0.78      0.98      0.87       753\n\n    avg / total       0.78      0.78      0.73      1017\n\n   Dumped classifier data to classifier.pkl\n   ```\n   This command generates a `classifier.pkl` binary file which can be used to predict any PR on the target Project.\n\n## TODO\n\n* Build a file to predict a particular PR against the trained model. A command like:\n  ```bash\n  \u003e python classify.py https://github.com/nodejs/node/pull/11107\n  Will not be merged!\n  ```\n\n## Built With\n\n* [scikit-learn](http://scikit-learn.org/) - Used their algorithms to estimate PR merge predictions.\n* [MongoDB](https://api.mongodb.com/python/current/) - Used to store Github downloaded project data.\n* [Git](https://git-scm.com/) - Used to compute diffs and analyze PR commit deltas.\n\n## Contributing\n\nFeel free to make a Pull Request if you find a bug or want to implement a feature. We welcome any help.\n\n## Authors\n\n* **Ignacio Avas** - *Initial work* - [igui](https://github.com/igui)\n\n## Acknowledgments\n\n* Pablo Grill for his insight and knowledge over Machine Learning\n\n## License\n\npullreq-ml is Copyright (c) 2018 sophilabs, inc. It is free software, and may be\nredistributed under the terms specified in the [license](LICENSE) file.\n\n## About\n\n[![sophilabs][sophilabs-image]][sophilabs-url]\n\npullreq-ml is maintained and funded by sophilabs, inc. The names and logos for\nsophilabs are trademarks of sophilabs, inc.\n\n[sophilabs-image]: https://s3.amazonaws.com/sophilabs-assets/logo/logo_300x66.gif\n[sophilabs-url]: https://sophilabs.co","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsophilabs%2Fpullreq-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsophilabs%2Fpullreq-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsophilabs%2Fpullreq-ml/lists"}