https://github.com/ignf/roc4j
The package roc4j is designed for estimating and handling Receiver Operating Characteristics (ROC) curves of binary classifiers in Java
https://github.com/ignf/roc4j
Last synced: 11 months ago
JSON representation
The package roc4j is designed for estimating and handling Receiver Operating Characteristics (ROC) curves of binary classifiers in Java
- Host: GitHub
- URL: https://github.com/ignf/roc4j
- Owner: IGNF
- Created: 2017-08-25T14:50:33.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-13T18:30:09.000Z (over 7 years ago)
- Last Synced: 2023-10-20T20:19:12.341Z (over 2 years ago)
- Language: Java
- Homepage:
- Size: 5.06 MB
- Stars: 4
- Watchers: 9
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Handling ROC curves with roc4j
===================
The package roc4j is designed for estimating and handling Receiver Operating Characteristics (ROC) curves of binary classifiers in Java. Robust and accurate estimation of ROC curve is of utmost importance in statistical learning.
Among roc4j main features:
- ROC curve computation
- ROC curve filtering and smoothing
- Confidence bands computation
- Graphical plots (exportable in png, svg, jpg...)
- Validation process handler
- Optimal operating points computation
### Screenshots

### Installation
------------------------
#### Option 1:
Download roc4j jar file at:
https://forge-cogit.ign.fr/nexus/content/repositories/snapshots/fr/ign/cogit/roc4j/1.0-SNAPSHOT/
#### Option 2:
Insert the following lines in your Maven pom.xml:
```xml
fr.ign.cogit
roc4j
1.0-SNAPSHOT
```
```xml
cogit-snapshots
Cogit Snapshots Repository
https://forge-cogit.ign.fr/nexus/content/repositories/snapshots/
```
### Tutorial
---------
A complete tutorial on how to use roc4j may be found online at the following address:
http://recherche.ign.fr/labos/cogit/demo/roc4j-doc/index.html
Or in PDF version: https://github.com/IGNF/roc4j/blob/master/doc/roc4j-doc.pdf
### Quick start
----------
```java
import java.util.Random;
import javax.swing.JFrame;
import fr.ign.cogit.roc4j.RocSpace;
import fr.ign.cogit.roc4j.RocSpaceStyle;
import fr.ign.cogit.roc4j.ConfidenceBands;
import fr.ign.cogit.roc4j.ReceiverOperatingCharacteristics;
//-----------------------------------------------------------------
// Program to compute a simple ROC curve on n simulated instances
// Confidence bands are computed at 95% level (default parameter)
// with Komogorov-Smirnov test statistic. ROC curve and its
// associated confidence bands are then depicted in a ROC space
//-----------------------------------------------------------------
public class Main {
public static void main(String[] args) {
// --------------------------------------------------------
// Parameters
// --------------------------------------------------------
int n = 500; // Number of validation instances
double noise = 0.1; // Standard deviation of noise
// --------------------------------------------------------
// Setting random seed
Random generator = new Random(123456789);
int[] expected = new int[n];
double[] score= new double[n];
// Instances generation
for (int i=0; i
### Real Application - Castle detection from building database with Random Forest
--------
**Authors:** Marie-Dominique Van Damme and Yann Méneroux
The data used for this experimentation may be found on the same github repository:
https://github.com/IGNF/roc4j/blob/master/sample/chateau.dat
#### Goal
The objective of this experimentation is to discriminate castle and non-castle in a building database.
The dataset is an extract of the data we used, containing 600 buildings, 50% of them being castles.
The first line of the file contains header with parameter names. First columns contains boolean value (1 if the building described on the row is a castle,
0 otherwise).
Each row contains 13 parameters, which have been computed from BDTOPO© (IGN building database):

1 - **hauteur** : height of the building
2 - **nb_orientation_mur** : number of wall orientations
3 - **orientation_generale** : general orientation of building
4 - **orientation_principale_mur** : main orientation of building walls
5 - **elongation** : length/width ratio of minimum bounding rectangle
6 - **concavite** : area of footprints out of convex hull
7 - **perimetre** : perimeter of footprint
8 - **nb_convexe** : number of parts in convex decomposition
9 - **compacite** : compacity index
10 - **granularite** : shortest wall
11 - **nb_concave** : number of concave parts
12 - **nb_pt_squelette** : number of points in geometric skeletton
13 - **long_squelette** : geometric skeletton length
Classification has been done with a Random Forest model (100 trees), using SMILE library.
The code provided below:
- reads the castle datafile (needs to be downloaded in local first)
- stores features and labels data in a DataSet object
- creates a ClassifierModel, wrapping SMILE RandomForest object
- designs a protocol for validation (15-fold cross validation)
- validates the classifier on the dataset and computes ROC curve
- performs boostrap to generate 20 replications of ROC curve
- computes smoothed version of the ROC curves with kernel estimation
- computes the average of all generated roc curves
- estimates ROC curve confidence bands at 95% with Fixed-Width Band method
- computes area under ROC curve (AUC) as a general performance index
- estimates the 95% confidence interval of AUC
- Displays ROC curves, confidence bands and numerical results in a plot
Note that the computation of confidence bands with Fixed-Width Band method require a few seconds (dependind upon the number of ROC curves generated by
bootstrap sampling).
```java
package ;
import javax.swing.JFrame;
import smile.classification.RandomForest;
import java.awt.Color;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.StringTokenizer;
import fr.ign.cogit.roc4j.core.ConfidenceBands;
import fr.ign.cogit.roc4j.core.ReceiverOperatingCharacteristics;
import fr.ign.cogit.roc4j.core.RocCurvesCollection;
import fr.ign.cogit.roc4j.graphics.RocSpace;
import fr.ign.cogit.roc4j.utils.Tools;
import fr.ign.cogit.roc4j.validation.Bootstrap;
import fr.ign.cogit.roc4j.validation.ClassifierModel;
import fr.ign.cogit.roc4j.validation.DataSet;
import fr.ign.cogit.roc4j.validation.ValidationProcess;
public class Main {
public static void main(String[] args) {
// Data labels and features
DataSet dataset = new DataSet();
// Data path in local
String datafile_path = "D:/workspace/roc4j/sample/chateau.dat";
// Number of features (max 13)
int Nf = 13;
// Confidence level
double level = 95.0;
// ----------------------------------------------------------------------
// Reading building dataset
// ----------------------------------------------------------------------
try {
Scanner scan = new Scanner(new File(datafile_path));
// Headline
scan.nextLine();
// Number of features
while(scan.hasNextLine()){
// Splitting line
StringTokenizer splitter = new StringTokenizer(scan.nextLine(), ",");
// Data labels
int y = Integer.parseInt(splitter.nextToken());
// Data features
double x[] = new double[Nf];
for (int i=0; i AUC = ROCS.computeAreasUnderCurves();
// Average of area under curve
double auc = Tools.round(100.0*Tools.computeMean(AUC), 1);
// Confidence interval on area under curve
double[] confidence_interval = Tools.computeConfidenceInterval(AUC, level);
double conf_inf = Tools.round(100.0*confidence_interval[0], 1);
double conf_sup = Tools.round(100.0*confidence_interval[1], 1);
// ----------------------------------------------------------------------
// Data representation
// ----------------------------------------------------------------------
RocSpace space = new RocSpace();
// Adding data to graphics
space.addRocCurve(bands.getCentralROC());
space.addRocCurve(ROCS);
space.addConfidenceBands(bands);
// Writing on graphics
space.setTitle("ROC curve of castle detection from building database");
space.writeText("AUC = "+auc+" %", 400, 500, 14, Color.BLACK);
space.writeText("IC @ 95% = ["+conf_inf+", "+conf_sup+"]", 400, 520, 14, Color.BLACK);
// Display
JFrame fen = new JFrame();
fen.setSize(700, 700);
fen.setContentPane(space);
fen.setLocationRelativeTo(null);
fen.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
fen.setVisible(true);
}
}
```
It is possible to try the code above with different number of features, confidence level, bootstrap replication number, SMILE classifier model,
smoothing method, validation process... and observe the change in plot and results.
For the default code provided here, we got an AUC equal to 86.2 %. Confidence interval indicated that the true unknown AUC is somewhere in [84.6%, 88.2%].
Increasing the number of buildings may enable to decrease the uncertainty.
Receiver Operating Characteristics curves (after bootstrap sampling) and 95% confidence bands for Random Forest classifier on castle detection problem
Note that adding some extrinsic parameters (such as distance to the nearest road) enables to reach up to 94% classification performance (AUC). However, in order to get
aesthetic ROC curves, we did not provide all the features in the dataset.