https://github.com/fracpete/common-csv-weka-package
CSV loader/saver for Weka that handles common formats (requires Weka 3.9.5+).
https://github.com/fracpete/common-csv-weka-package
converter csv java weka
Last synced: 5 months ago
JSON representation
CSV loader/saver for Weka that handles common formats (requires Weka 3.9.5+).
- Host: GitHub
- URL: https://github.com/fracpete/common-csv-weka-package
- Owner: fracpete
- License: gpl-3.0
- Created: 2019-01-22T22:16:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-07-04T22:11:58.000Z (almost 2 years ago)
- Last Synced: 2024-10-19T12:16:06.621Z (over 1 year ago)
- Topics: converter, csv, java, weka
- Language: Java
- Homepage:
- Size: 16.7 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# common-csv-weka-package
Weka package for loader and saver for common CSV formats, using the
[Apache Commons CSV](https://commons.apache.org/proper/commons-csv/) library.
Requires Weka 3.9.5+.
Supported formats:
* **DEFAULT** - Standard Comma Separated Value format, as for RFC4180 but allowing empty lines.
* **EXCEL** - The Microsoft Excel CSV format.
* **INFORMIX_UNLOAD** - Informix UNLOAD format used by the UNLOAD TO file_name operation.
* **INFORMIX_UNLOAD_CSV** - Informix CSV UNLOAD format used by the UNLOAD TO file_name operation (escaping is disabled.)
* **MYSQL** - The MySQL CSV format.
* **ORACLE** - Default Oracle format used by the SQL*Loader utility.
* **POSTGRESSQL_CSV** - Default PostgreSQL CSV format used by the COPY operation.
* **POSTGRESSQL_TEXT** - Default PostgreSQL text format used by the COPY operation.
* **RFC-4180** - The RFC-4180 format defined by RFC-4180.
* **TDF** - A tab delimited format.
## Options
The loader:
```
Usage:
CommonCSVLoader [options]
Options:
-decimal
The maximum number of digits to print after the decimal
place for numeric values (default: 6)
-F
The CSV format to use
(default: DEFAULT)
-use-custom-field-separator
Whether to use custom field separator
(default: no)
-custom-field-separator
The custom field separator
(default: ,)
-use-custom-quote-character
Whether to use custom quote character
(default: no)
-custom-quote-character
The custom quote character
(default: ")
-use-custom-quote-mode
Whether to use custom quote mode
(default: no)
-custom-quote-mode
The custom quote mode
(default: MINIMAL)
-use-custom-escape-character
Whether to use custom escape character
(default: no)
-custom-escape-character
The custom escape character
(default: )
-no-header
Whether there is no header row in the spreadsheet
(default: assumes header row present)
-nominal
The attribute range to treat as nominal
(default: none)
-nominal-label-spec
Optional specification of legal labels for nominal
attributes. May be specified multiple times.
The spec contains two parts separated by a ":".
The first part can be a range of attribute indexes or
a comma-separated list off attruibute names;
the second part is a comma-separated list of labels. E.g.:
"1,2,4-6:red,green,blue" or "att1,att2:red,green,blue"
-string
The attribute range to treat as string
(default: none)
-date
The attribute range to treat as date
(default: none)
-date-format
The format to use for parsing the date attribute(s)
see: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/format/DateTimeFormatter.html
(default: yyyy-MM-dd'T'HH:mm:ss)
-missing-value
The string to interpret as missing value
(default: '')
-num-rows-type-detection
The number of rows to use for detecting numeric rows
(default: '100')
```
The saver:
```
CommonCSVSaver options:
-i
The input file
-o
The output file
-F
The CSV format to use
(default: DEFAULT)
-use-custom-field-separator
Whether to use custom field separator
(default: no)
-custom-field-separator
The custom field separator
(default: ,)
-use-custom-quote-character
Whether to use custom quote character
(default: no)
-custom-quote-character
The custom quote character
(default: ")
-use-custom-quote-mode
Whether to use custom quote mode
(default: no)
-custom-quote-mode
The custom quote mode
(default: MINIMAL)
-use-custom-escape-character
Whether to use custom escape character
(default: no)
-custom-escape-character
The custom escape character
(default: )
-no-header
Whether to suppress output of header row
(default: outputs header)
```
## Releases
* [2024.7.5](https://github.com/fracpete/common-csv-weka-package/releases/download/v2024.7.5/common-csv-2024.7.5.zip)
* [2021.1.3](https://github.com/fracpete/common-csv-weka-package/releases/download/v2021.1.3/common-csv-2021.1.3.zip)
* [2020.12.30](https://github.com/fracpete/common-csv-weka-package/releases/download/v2020.12.30/common-csv-2020.12.30.zip)
* [2020.12.29](https://github.com/fracpete/common-csv-weka-package/releases/download/v2020.12.29/common-csv-2020.12.29.zip)
* [2020.11.29](https://github.com/fracpete/common-csv-weka-package/releases/download/v2020.11.29/common-csv-2020.11.29.zip)
## Maven
Use the following dependency in your `pom.xml`:
```xml
com.github.fracpete
common-csv-weka-package
2024.7.5
jar
nz.ac.waikato.cms.weka
weka-dev
```
## How to use packages
For more information on how to install the package, see:
https://waikato.github.io/weka-wiki/packages/manager/