https://github.com/radon-h2020/radon-defect-prediction-endpoints
App to expose endpoints for pre-trained model download and usage
https://github.com/radon-h2020/radon-defect-prediction-endpoints
Last synced: about 2 months ago
JSON representation
App to expose endpoints for pre-trained model download and usage
- Host: GitHub
- URL: https://github.com/radon-h2020/radon-defect-prediction-endpoints
- Owner: radon-h2020
- License: apache-2.0
- Created: 2021-03-12T09:20:19.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-12T09:32:55.000Z (about 4 years ago)
- Last Synced: 2025-01-12T11:37:27.775Z (4 months ago)
- Language: Python
- Size: 1.99 MB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Radon-defect-prediction-endpoints
This repository provised the APIs to expose Ansible and Tosca pre-trained defect prediction models.
## Endpoints
- Get model: https://radon-defect-prediction.herokuapp.com/models
- Predict: https://radon-defect-prediction.herokuapp.com/predictions### Get model
Get a pre-trained model from the most similar project.`GET https://radon-defect-prediction.herokuapp.com/models?parm1=value1&...¶mN=valueN`
**Example:**
`GET https://radon-defect-prediction.herokuapp.com/models?language=ansible&repository_size=560&comments_ratio=0.03&has_license=1`**Parameters**
| Name | Value | Default|
|------|-------|-----------|
|`language`| string | None. Choose between {ansible, tosca} |
|`return_model` | int [0: false, 1: true] | 0 |
|`comments_ratio` | numeric | 0 |
|`commit_frequency`| numeric | 0 |
|`core_contributors`| numeric | 0 |
|`has_ci`| [0: false, 1: true] | 0 |
|`has_license`| [0: false, 1: true] | 0 |
|`iac_ratio`| numeric | 0 |
|`issue_frequency`| numeric | 0 |
|`repository_size`| numeric | 0 |**Return**
A json containing the model id to use for calls to the `predictions/` endpoint, a list of models, and a similarity score ([0-1]) between the client's and models' project.
```
{
"model_id": 24242603,
"similarity": 0.9999,
"models": [
{
"type": "general"
"rules": "|--- num_include <= 0.44\n|... "
},
{
"type": "conditional"
"rules": "... "
},
]
}
```If `return_model=true`, it returns the raw `joblib` model.
Once saved somewhere, the model can be loaded and used in Python as follows:```python
import joblibmodel = joblib.load('path/to/model.joblib', mmap_mode='r')
selected_features = model['selected_features']
normalizer = model['estimator'].named_steps['normalization']
tree_classifier = model['estimator'].named_steps['classification']# selected_features can be used to reduce a new data frame to the same subset of model features
# normalizer (if not None) must be used to normalize the new data in the same fashion as model's training data
# tree_classifier can be used for predicting failure-prone script
```### Predict
Predict the failure-proneness of a file represented by a set of metrics.`GET https://radon-defect-prediction.herokuapp.com/predictions?parm1=value1&...¶mN=valueN`
**Example:**
`GET https://radon-defect-prediction.herokuapp.com/predictions?language=ansible&model_id=24242603&num_names_with_vars=10&num_ignore_errors=3&num_conditions=1`**Parameters**
| Name | Value | Default|
|------|-------|-----------|
|`language`| string | None. Choose between {ansible, tosca} |
|`model_id` | int | None. It is mandatory. Use the model_id from the `/models` endpoint |The remaining parameters are the Ansible or Tosca metrics that can be extracted by the script.
**See how** extract those metrics with [AnsibleMetrics](https://github.com/radon-h2020/radon-ansible-metrics) and
[ToscaMetrics](https://github.com/radon-h2020/radon-tosca-metrics).**Note:** Metrics are passed instead of the raw script to avoid information disclosure.
**Result**
A json file containing the prediction (i.e., `failure_prone: true/false`), and the decision that led to the prediction
(absent if `failure_prone: false`).```
{
"failure_prone": true,
"decision": [
[
"num_names_with_vars",
">",
2.4
],
[
"num_ignore_errors",
"<=",
4.16
],
[
"num_conditions",
"<=",
17.12
],
[
"num_filters",
"<=",
0.0
]
]
}
```In this example, a script has been predicted as **failure-prone** because:
num_names_with_vars > 2.4 **AND** num_ignore_errors <= 4.16 **AND** num_conditions <= 17.12 **AND** num_filters <= 0.