Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cjdoris/arfffiles.jl

Load and save ARFF files
https://github.com/cjdoris/arfffiles.jl

Last synced: 24 days ago
JSON representation

Load and save ARFF files

Host: GitHub
URL: https://github.com/cjdoris/arfffiles.jl
Owner: cjdoris
License: mit
Created: 2020-08-26T18:13:10.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-06-07T16:44:22.000Z (over 2 years ago)
Last Synced: 2024-10-09T23:23:34.487Z (27 days ago)
Language: Julia
Size: 53.7 KB
Stars: 5
Watchers: 3
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # ARFFFiles.jl

Load and save [ARFF (Attribute Relation File Format)](https://waikato.github.io/weka-wiki/formats_and_processing/arff/) files.

Integrated into [`Tables.jl`](https://github.com/JuliaData/Tables.jl) for easily converting to your favourite table types.

## Install

```

] add ARFFFiles

```

## Quick start

To load an ARFF file as a `DataFrame`:

```julia

using ARFFFiles, DataFrames

df = ARFFFiles.load(DataFrame, "mytable.arff")

```

Replace `DataFrame` with your favourite table type, or leave it out to get an `ARFFTable`.

To save any Tables.jl-compatible table:

```julia

using ARFFFiles

ARFFFiles.save("mytable.arff", df)

```

## Loading

- `load(file)` loads the table in the given file (filename or IO stream) as an `ARFFTable`.

- `load(func, file)` is equivalent to `func(load(file))` but operates recursively on any relational columns.

- `loadstreaming(file)` returns a `ARFFReader` object `r`:

    - Satisfies the `Tables.jl` interface, so can be materialized as a table.

    - `r.header` contains the header parsed from `io`.

    - Iterates rows of type `ARFFRow`.

    - `read(r)`, `read(r, n)` and `read!(r, x)` reads rows of the table.

    - `readcolumns(r, [maxbytes=nothing])` reads the whole table into a columnar format. Specify `maxbytes` to read a portion of the rows.

    - `close(r)` closes the underlying io stream, unless `own=false`.

- `loadstreaming(func, file)` is equivalent to `func(loadstreaming(file))` but ensures the file is closed afterwards.

- `loadchunks(file)` returns an iterator of `ARFFTable`s for efficiently streaming very large tables. Equivalent to `Tables.partitions(loadstreaming(file))`.

- `loadchunks(func, file)` is equivalent to `func(loadchunks(file))` but ensures the file is closed afterwards.

**Types.** Numbers load as `Float64`, strings as `String`, dates as `DateTime`, nominals as `CategoricalValue{String}` (from [`CategoricalArrays`](https://github.com/JuliaData/CategoricalArrays.jl)) and relationals as `ARFFTable`.

**Keyword options.**

- `missingcols=:auto`: Controls which columns may contain missing data (`?`). It can be `:auto`, `:all`, `:none`, a set or vector of column names (symbols), or a function taking a symbol and returning true if that column can contain missing. If the table is being read in a streaming fashion, then `:auto` behaves the same as `:all`.

- `missingnan=false`: Convert missing values in numeric columns to NaN. This is equivalent to excluding these columns in `missingcols`.

- `categorical=true`: When false, nominal columns are converted to `String` instead of `CategoricalValue{String}`.

- `chunkbytes=2^26`: Read approximately this many bytes per chunk when iterating over chunks or rows.

- `own=false`: Signals whether or not to close the underlying IO stream when `close(::ARFFReader)` is called.

## Saving

- `save(file, table)` saves the Tables.jl-compatible `table` to `file` (a filename or IO stream).

**Types.** `Real` is saved as numeric, `AbstractString` as string, `DateTime` and `Date` as date, and `CategoricalValue{<:AbstractString}` as nominal.

**Keyword options.**

- `relation="data"`: The relation name.

- `comment`: A comment to print at the top of the file.