Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sigpwned/tabular4j
A library for reading and writing pan-format tabular data, especially spreadsheets, in Java 11+.
https://github.com/sigpwned/tabular4j
csv excel java java-11 spreadsheets tabular-data tsv xls xlsx
Last synced: 3 months ago
JSON representation
A library for reading and writing pan-format tabular data, especially spreadsheets, in Java 11+.
- Host: GitHub
- URL: https://github.com/sigpwned/tabular4j
- Owner: sigpwned
- License: apache-2.0
- Created: 2023-01-03T04:21:17.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-13T18:21:21.000Z (4 months ago)
- Last Synced: 2024-09-28T23:20:57.589Z (4 months ago)
- Topics: csv, excel, java, java-11, spreadsheets, tabular-data, tsv, xls, xlsx
- Language: Java
- Homepage:
- Size: 304 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TABULAR4J [![tests](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml/badge.svg)](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml) [![Maven Central Version](https://img.shields.io/maven-central/v/com.sigpwned/tabular4j)](https://search.maven.org/artifact/com.sigpwned/tabular4j)
A framework for reading and writing tabular data using popular data formats, especially spreadsheets.
## Motivation
Tabular data is an important interface between humans and machine. However, supporting multiple spreadsheet formats transparently is complex. This library provides a framework of adapters and facades for reading and writing tabular data using popular spreadsheet data formats.
## Goals
* To support the straightforward reading and writing of tabular data in common file formats, especially spreadsheets.
* To support common value types for cells to streamline development and minimize errors.## Non-Goals
* To support non-tabular layouts of data, e.g. merged cells in Excel.
* To expose all features for all data formats, e.g. charts in Excel. Outstanding specialized libraries already exist for the use and manipulation of specific data formats.
* To support all data formats that can represent tabular data. The library's primary goal is to support the data formats most commonly exchanged between humans and computers.## Installing
### Maven Dependencies
You can get the library from Maven central:
com.sigpwned
tabular4j-csv
0.0.0-b2
com.sigpwned
tabular4j-excel
0.0.0-b2
### Maven Build
The tabular4j library uses the [ServiceLoader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html) to discover supported file formats automatically at runtime. This requires each JAR file to include a special `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` file that lists file format factory classes the JAR provides. To ensure that your application gets *all* of the supported file formats, as opposed to just the supported file formats from the last JAR added to your build, use the `maven-shade-plugin` to merge the `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` files from all of the JARs in your build:
org.apache.maven.plugins
maven-shade-plugin
3.6.0
package
shade
## Data Model
### Worksheets
A worksheet is a single spreadsheet containing one rectangular table of data. The first row of a worksheet is interpreted as column names (i.e., headers), and determines the width of the table of data. Each addditional row of a worksheet is interpreted as a value in each of the named columns. If a row's actual width is greater than the number of columns, then it is truncated before it is interpreted; if its actual width is less than the number of the columns, then it is right-padded with null-valued cells before it is interpreted.
+---------+---------+---------+---------+---------+
| alpha | bravo | charlie | delta | | <-- Column Names/Headers. Width = 4.
+---------+---------+---------+---------+---------+
| a | b | c | d | | <-- Width 4. No transformation.
+---------+---------+---------+---------+---------+
| a | b | c | | | <-- Width 3. Will be right-padded with null.
+---------+---------+---------+---------+---------+
| | b | c | d | | <-- Width 4. First cell is empty.
+---------+---------+---------+---------+---------+
| a | b | c | d | e | <-- Width 5. Will be truncated to width 4.
+---------+---------+---------+---------+---------+
| a | b | c | d | | <-- Width 4. No transformation.
+---------+---------+---------+---------+---------+
### WorkbooksA workbook is an ordered list of spreadsheets gathered into a single file. Each worksheet in a workbook has a unique name. One sheet in the workbook is "active". For a workbook created by a human, this is the worksheet that was last edited by the user and/or is shown when the file is opened; for a workbook created by the library, it is simply the worksheet with the active flag set.
## Code Examples
### Opening a File to Read
Users open an existing workbook file to read using a `ByteSource`. Note that the spreadsheet format (e.g., CSV, XLSX, etc.) is never given explicitly; rather, the framework detects the format of the spreadsheet and reacts accordingly.
try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(ByteSource.ofFile(file))) {
// Handle workbook here...
}### Reading a Worksheet
This approach opens a workbook file and processes its active worksheet only.
#### Manual Iteration
First, the user can open it and then iterate over its rows manually:
try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
for(TabularWorksheetRow row=worksheet.readRow();row!=null;row=worksheet.readRow()) {
// Handle row here...
}
}
#### Automatic Iteration using IteratorAlternatively, the user can use an `Iterator` directly. Note that any `IOException` exceptions are thrown as `UncheckedIOException` in this case.
try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
for(TabularWorksheetRow row : worksheet) {
// Handle row here...
}
}#### Automatic Processing using Stream
The user an also use a Java 8 `Stream` to process rows:
try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
worksheet.stream().forEach(c -> {
// Handle row here...
});
}#### Automatic Processing using Consumer
Finally, if the user doesn't need to control how rows are read and processed ("pull" style), then the user can also use a consumer that walks the sheet automatically ("push" style):
Spreadsheets.processTabularWorksheet(source, new TabularWorksheetConsumer() {
public default void beginTabularWorksheet(int sheetIndex, String sheetName, List columnNames) {
// Handle setup here...
}public default void tabularRow(int rowIndex, List cells) {
// Handle row here...
}public default void endTabularWorksheet() {
// Handle cleanup here...
}
});
### Reading a WorkbookThis approach opens a workbook file and processes each of its worksheets in order. It shares much logic with the worksheet examples above.
#### By Index
First, the user can open the file and iterate over its sheets by index:
try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(source)) {
for(int i=0;i