https://github.com/erfaniaa/commit-type-detection
Classify Git commits with deep learning
https://github.com/erfaniaa/commit-type-detection
classification deep-learning neural-network paper python pytorch tf-idf
Last synced: 6 months ago
JSON representation
Classify Git commits with deep learning
- Host: GitHub
- URL: https://github.com/erfaniaa/commit-type-detection
- Owner: Erfaniaa
- License: gpl-3.0
- Created: 2020-02-02T23:25:01.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-12-15T20:35:38.000Z (almost 2 years ago)
- Last Synced: 2025-03-25T14:21:43.311Z (7 months ago)
- Topics: classification, deep-learning, neural-network, paper, python, pytorch, tf-idf
- Language: Python
- Size: 139 KB
- Stars: 18
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Commit Type Detection
Classify Git commits with deep learning
# Introduction
According to [this](https://arxiv.org/pdf/1711.05340.pdf) paper, we suppose that there are 3 main classification categories for software project maintenance activities:
**Corrective**: fixing faults (functional and non-functional)
**Perfective**: improving the system and its design
**Adaptive**: introducing new features into the system
In this work, we seek to design a commit classification model capable of providing high accuracy to detect these three types of commits.
The used dataset can be found [here](https://zenodo.org/record/835534).
# Method
In the mentioned paper, three algorithms have been used and compared. Among J48, GBM, and RF algorithms, RF had a better performance.
Instead of using these algorithms, we implemented a **deep learning** approach. Here you can see the implemented neural network architecture (copied from network.py file):
```python
class Network(nn.Module):
def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):
super(Network, self).__init__()
self.fc1 = nn.Linear(input_size, 80)
self.fc2 = nn.Linear(80, 60)
self.dropout1 = nn.Dropout(0.01)
self.fc3 = nn.Linear(60, 40)
self.fc4 = nn.Linear(40, 20)
self.fc5 = nn.Linear(20, output_size)def forward(self, x):
x = self.fc1(x)
x = F.relu(x)
x = self.dropout1(x)
x = self.fc2(x)
x = F.relu(x)
x = self.dropout1(x)
x = self.fc3(x)
x = F.relu(x)
x = self.dropout1(x)
x = self.fc4(x)
x = F.relu(x)
x = self.dropout1(x)
x = self.fc5(x)
x = torch.tanh(x)
return x
```As you can read, a fully-connected neural network has been implemented in **PyTorch** deep learning framework.
In our dataset, each commit has a message, project name, and 68 other features. By applying **tf-idf** algorithm on the commit messages, we may convert each commit data to a vector with size 100. So, the input of this network is a vector with a size equal to 100.
Like the paper method, our models were trained using 85% of the dataset, while the remaining 15% was used as a test set.
# Result
A confusion matrix will be shown after training. You can compare this data to the 8th table of the mentioned paper. As you can see, our method has reached **74.5% accuracy** in this case.
```
Predict a c p
Actual
a 17 4 10c 5 74 6
p 3 16 38
Overall Statistics:
Kappa 0.57912
NIR 0.49133
Overall Accuracy 0.74566
P-Value [Accuracy > NIR] 0.0Class Statistics:
Classes Adaptive Corrective Perfective
ACC(Accuracy) 0.87283 0.82081 0.79769
ERR(Error rate) 0.12717 0.17919 0.20231
FN(False negative/miss/type 2 error) 14 11 19
FP(False positive/type 1 error/false alarm) 8 20 16
FPR(Fall-out or false positive rate) 0.05634 0.22727 0.13793
PPV(Precision or positive predictive value) 0.68 0.78723 0.7037
TN(True negative/correct rejection) 134 68 100
TNR(Specificity or true negative rate) 0.94366 0.77273 0.86207
TP(True positive/hit) 17 74 38
TPR(Sensitivity, recall, hit rate, or true positive rate) 0.54839 0.87059 0.66667
```# Usage
Use Python version 3.
First of all, install the required Python packages:
```bash
pip install requirements.txt
```And then run the Python program:
```
python main.py
```