https://github.com/waikato-datamining/djl-arff
Reading ARFF datasets using Deep Java Library (DJL).
https://github.com/waikato-datamining/djl-arff
arff djl weka
Last synced: 9 months ago
JSON representation
Reading ARFF datasets using Deep Java Library (DJL).
- Host: GitHub
- URL: https://github.com/waikato-datamining/djl-arff
- Owner: waikato-datamining
- License: apache-2.0
- Created: 2025-05-01T02:36:57.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-28T23:54:54.000Z (11 months ago)
- Last Synced: 2025-07-29T01:24:41.711Z (11 months ago)
- Topics: arff, djl, weka
- Language: Java
- Homepage:
- Size: 72.3 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# djl-arff
Reading ARFF datasets using [Deep Java Library (DJL)](https://djl.ai/).
Rather than explicitly defining all the features and labels manually,
the dataset builder offers a number of methods that simplify specifying
which columns are to be used as labels (i.e., class attributes/output variables)
and which as features (i.e., input variables). It is also possible to
specify columns to ignore completely.
Works with DJL version 0.21.0 and later.
## Usage
Below is an example of how to load the UCI datatset iris, using the last column
as class attribute and only features that match `petal.*`:
```java
import nz.ac.waikato.cms.adams.djl.dataset.ArffDataset;
import java.nio.file.Path;
ArffDataset dataset = ArffDataset.builder()
.optArffFile(Path.of("src/main/resources/iris.arff"))
.setSampling(32, true)
.classIsLast()
.addMatchingFeatures("petal.*")
.build();
```
Here is an overview of the available `ArffDataset.ArffBuilder` methods:
* `dateColumnsAsNumeric()` - treat `DATE` attributes as `NUMERIC` instead of ignoring them
* `stringColumnsAsNominal()` - treat `STRING` attributes as `NOMINAL` instead of ignoring them
* `classIndex(int...)` - sets the 0-based index/indices of the column(s) to use as class attribute(s)
* `classIsFirst()` - uses the first column as class attribute
* `classIsLast()` - uses the last column as class attribute
* `addClassColumn(String...)` - adds the specified column(s) as class attribute(s)
* `addIgnoredColumn(String...)` - specifies column(s) to be ignored
* `ignoreMatchingColumns(String...)` - ignores columns that match the regexp(s)
* `addAllFeatures()` - adds all columns as features that are neither ignored nor class attributes
* `addMatchingFeatures(String...)` - adds all columns that match the regexp(s) that are neither ignored nor class attributes
* `optArffFile(Path)` - the file to the ARFF file to load
* `optArffUrl(String)` - the URL of the ARFF file to load
* `fromJson` - can instantiate the builder from the JSON settings (as provided by `ArffDataset.toJson`)
Either method of the builder instance must be called:
*
* `optArffFile`
* `optArffUrl`
* `fromJson`
## Examples
Some example classes for loading ARFF files:
* [Load airline dataset](src/main/java/nz/ac/waikato/cms/adams/djl/dataset/example/LoadAirline.java)
* [Load bodyfat dataset (adding columns automatically)](src/main/java/nz/ac/waikato/cms/adams/djl/dataset/example/LoadBodyfatAutomatic.java)
* [Load bodyfat dataset (explicitly adding columns)](src/main/java/nz/ac/waikato/cms/adams/djl/dataset/example/LoadBodyfatExplicit.java)
* [Load iris dataset](src/main/java/nz/ac/waikato/cms/adams/djl/dataset/example/LoadIris.java)
* [Load iris dataset (STRING class attribute)](src/main/java/nz/ac/waikato/cms/adams/djl/dataset/example/LoadIrisString.java)
## Maven
Add the following dependency to your `pom.xml`:
```xml
nz.ac.waikato.cms.adams
djl-arff
0.0.2
```