https://github.com/parvezmrobin/ml-br-deduplication
Replicated a state-of-the-art duplicate bug report detection technique using MongoDB, Docker, TensorFlow, and Scikit learn
https://github.com/parvezmrobin/ml-br-deduplication
deep-learning siamese-network
Last synced: 10 months ago
JSON representation
Replicated a state-of-the-art duplicate bug report detection technique using MongoDB, Docker, TensorFlow, and Scikit learn
- Host: GitHub
- URL: https://github.com/parvezmrobin/ml-br-deduplication
- Owner: parvezmrobin
- Created: 2022-09-20T07:51:28.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-09-21T00:48:41.000Z (over 3 years ago)
- Last Synced: 2025-06-14T09:05:45.718Z (10 months ago)
- Topics: deep-learning, siamese-network
- Language: Jupyter Notebook
- Homepage:
- Size: 782 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# An Empirical Study On Duplicate Bug Report Identification Using Siamese Cross-Encoder Network
This project replicates "Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques"
by Jayati Deshmukh, K. M. Annervaz, Sanjay Podder, Shubhashis Sengupta, and Neville Dubash
in International Conference on Software Maintenance (ICSM) 2017.
We show that even without handling structured information separately, we can achieve
comparable performance with respect to the original work.
## Dataset
Download and store the dataset into MongoDB from [here](http://alazar.people.ysu.edu/msr14data/).
If you are using Docker for MongoDB, you can find the `docker-compose.yaml` file in the root directory.
# Installing Packages
We highly encourage to use a virtual environment to run the project.
You can find the list of necessary packages in the `requirements.txt` file in the root directory.
Install them by running
```shell
pip install -r requirements.txt
```
## Run
Start a jupyter server by running
```shell
jupyter notebook
```
Then open `notebooks/siamese-trials/title-descr-eclipse.ipynb` in the jupyter app.
To see result for different datasets, change the third line in the second cell accordingly.
## Result
Check `project-report.pdf` for detailed analysis of the evaluation and findings.