https://github.com/clojure-finance/clojask-io
Reading and writing various file formats for Clojask: clojask-io is a library designed to extend the file support for Clojask. This library can also be used alone to read in and output dataset files.
https://github.com/clojure-finance/clojask-io
clojure data-science dataframe file-format input-output
Last synced: 10 months ago
JSON representation
Reading and writing various file formats for Clojask: clojask-io is a library designed to extend the file support for Clojask. This library can also be used alone to read in and output dataset files.
- Host: GitHub
- URL: https://github.com/clojure-finance/clojask-io
- Owner: clojure-finance
- License: mit
- Created: 2022-06-18T13:24:08.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-06T01:50:32.000Z (over 2 years ago)
- Last Synced: 2025-01-30T04:32:26.717Z (11 months ago)
- Topics: clojure, data-science, dataframe, file-format, input-output
- Language: Clojure
- Homepage: https://clojure-finance.github.io/clojask-website
- Size: 54.7 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Clojask-io
A Clojure library designed to extend the file support for [Clojask](https://github.com/clojure-finance/clojask). This library can also be used alone to read in and output dataset files.
## Installation
Available on Clojars: [](https://clojars.org/com.github.clojure-finance/clojask-io)
## APIs
### NS: clojask-io.core
#### `supports`
Check if this library supports to read and write this format of file.
| Argument | Type | Function | Remarks |
| -------- | ------ | ------------------------------------------------------------ | ------- |
| `format` | String | Indicates the format of type, e.g. "csv", "xls", "txt", etc. | |
**Return**
Boolean
#### `infer-format`
Infer the file format from the path (get the substring after the last `.`).
| Argument | Type | Function | Remarks |
| -------- | ------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `path` | String | Indicates where to find the file (either on local machine or online) | - For local files, absolute / relative path of the file
- For online resources, url of the resources |
**Return**
String, such as "csv", "xls" (`nil` if fails to infer)
-----
### NS: clojask-io.input
#### `read-file`
Read in a file as lazy sequence. Optionally, provide size of the file, corresponding output functions.
*Supported file types: **csv, txt, dat, tsv, tab**. The file size can be larger than memory.*
| Argument | Type | Function | Remarks |
| ---------- | ---------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `path` | String | Indicates where to find the file (either on local machine or online) | - For local files, absolute / relative path of the file
- For online resources, url of the resources |
| [`format`] | String | The format of the file | Will be inferred from the path suffix if not provided. Will imply the separator (`sep`) based on pre-setting. |
| [`sep`] | String / java.util.regex.Pattern (regex) | The separator of each row of the dataset file | |
| [`wrap`] | String | Wrapper of each value | Sometimes, the file will wrap each value some punctuations, e.g `""` / `''`. Can remove them automatically by setting this argument. Does not support asymmetric wrappers. |
| [`stat`] | Boolean | Whether to get the size of the file | If true, the return value will add a :size key-value pair in unit of bytes. Size value will be `nil` if cannot be retrieved. |
| [`output`] | Boolean | Whether to also return the corresponding output function | If true, the return value will add a :output key-value pair. |
**Return**
{:data `a lazy sequence of vectors representing each row` [:size `the size in byte`] [:output `output function`]}
#### `read-excel`
Read in an excel file as lazy sequence. Optionally, provide size of the file.
*A simplified wrapper function of [Docjure](https://github.com/mjul/docjure). The excel file should be smaller than memory size.*
| Argument | Type | Function | Remarks |
| -------- | ------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `path` | String | Indicates where to find the file (either on local machine or online) | - For local files, absolute / relative path of the file
- For online resources, url of the resources |
| `sheet` | String | Name of the sheet | |
| [`stat`] | Boolean | Whether to get the size of the file | If true, the return value will add a :size key-value pair in unit of bytes. Size value will be `nil` if cannot be retrieved. |
**Return**
{:data `a lazy sequence of vectors representing each row` [:size `the size in byte`]}
----
### NS: clojask-io.output
#### `write-csv`
Synchronously write a collection of collections to a csv-like file.
| Argument | Type | Function | Remarks |
| ----------- | ------------------------ | -------------------------------------------- | ------------------------------------------------------------ |
| `writer` | `java.io.BufferedWriter` | The writer initialized to the output file | |
| `sequence` | Collection | The output content | Should a two-dimensional collection |
| `separator` | String | The separator between values in the same row | |
**Implementation**
```clojure
(defn write-csv
"output to a csv file using a collection of collections"
[writer seq sep]
(doseq [row seq]
(.write writer (str (str/join sep row) "\n"))
))
```
**Return**
`nil`
#### `write-excel`
Synchronously write a collection of collections to an excel file.
*A simplified wrapper function of [Docjure](https://github.com/mjul/docjure).*
| Argument | Type | Function | Remarks |
| ---------- | ---------- | --------------------------------- | --------------------------------------------- |
| `path` | String | Indicates the path of output file | Absolute / relative path of local file system |
| `sheet` | String | Name of the sheet | |
| `sequence` | Collection | The output content | Should a two-dimensional collection |
**Return**
`nil`
-----------------------
Copyright © 2022 Clojask-io