Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sigpwned/tabular4j

A library for reading and writing pan-format tabular data, especially spreadsheets, in Java 11+.
https://github.com/sigpwned/tabular4j

csv excel java java-11 spreadsheets tabular-data tsv xls xlsx

Last synced: 5 days ago
JSON representation

A library for reading and writing pan-format tabular data, especially spreadsheets, in Java 11+.

Awesome Lists containing this project

README

        

# TABULAR4J [![tests](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml/badge.svg)](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml) [![Maven Central Version](https://img.shields.io/maven-central/v/com.sigpwned/tabular4j)](https://search.maven.org/artifact/com.sigpwned/tabular4j)

A framework for reading and writing tabular data using popular data formats, especially spreadsheets.

## Motivation

Tabular data is an important interface between humans and machine. However, supporting multiple spreadsheet formats transparently is complex. This library provides a framework of adapters and facades for reading and writing tabular data using popular spreadsheet data formats.

## Goals

* To support the straightforward reading and writing of tabular data in common file formats, especially spreadsheets.
* To support common value types for cells to streamline development and minimize errors.

## Non-Goals

* To support non-tabular layouts of data, e.g. merged cells in Excel.
* To expose all features for all data formats, e.g. charts in Excel. Outstanding specialized libraries already exist for the use and manipulation of specific data formats.
* To support all data formats that can represent tabular data. The library's primary goal is to support the data formats most commonly exchanged between humans and computers.

## Installing

### Maven Dependencies

You can get the library from Maven central:



com.sigpwned
tabular4j-csv
0.0.0-b2



com.sigpwned
tabular4j-excel
0.0.0-b2

### Maven Build

The tabular4j library uses the [ServiceLoader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html) to discover supported file formats automatically at runtime. This requires each JAR file to include a special `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` file that lists file format factory classes the JAR provides. To ensure that your application gets *all* of the supported file formats, as opposed to just the supported file formats from the last JAR added to your build, use the `maven-shade-plugin` to merge the `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` files from all of the JARs in your build:




org.apache.maven.plugins
maven-shade-plugin
3.6.0


package

shade










## Data Model

### Worksheets

A worksheet is a single spreadsheet containing one rectangular table of data. The first row of a worksheet is interpreted as column names (i.e., headers), and determines the width of the table of data. Each addditional row of a worksheet is interpreted as a value in each of the named columns. If a row's actual width is greater than the number of columns, then it is truncated before it is interpreted; if its actual width is less than the number of the columns, then it is right-padded with null-valued cells before it is interpreted.

+---------+---------+---------+---------+---------+
| alpha | bravo | charlie | delta | | <-- Column Names/Headers. Width = 4.
+---------+---------+---------+---------+---------+
| a | b | c | d | | <-- Width 4. No transformation.
+---------+---------+---------+---------+---------+
| a | b | c | | | <-- Width 3. Will be right-padded with null.
+---------+---------+---------+---------+---------+
| | b | c | d | | <-- Width 4. First cell is empty.
+---------+---------+---------+---------+---------+
| a | b | c | d | e | <-- Width 5. Will be truncated to width 4.
+---------+---------+---------+---------+---------+
| a | b | c | d | | <-- Width 4. No transformation.
+---------+---------+---------+---------+---------+

### Workbooks

A workbook is an ordered list of spreadsheets gathered into a single file. Each worksheet in a workbook has a unique name. One sheet in the workbook is "active". For a workbook created by a human, this is the worksheet that was last edited by the user and/or is shown when the file is opened; for a workbook created by the library, it is simply the worksheet with the active flag set.

## Code Examples

### Opening a File to Read

Users open an existing workbook file to read using a `ByteSource`. Note that the spreadsheet format (e.g., CSV, XLSX, etc.) is never given explicitly; rather, the framework detects the format of the spreadsheet and reacts accordingly.

try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(ByteSource.ofFile(file))) {
// Handle workbook here...
}

### Reading a Worksheet

This approach opens a workbook file and processes its active worksheet only.

#### Manual Iteration

First, the user can open it and then iterate over its rows manually:

try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
for(TabularWorksheetRow row=worksheet.readRow();row!=null;row=worksheet.readRow()) {
// Handle row here...
}
}

#### Automatic Iteration using Iterator

Alternatively, the user can use an `Iterator` directly. Note that any `IOException` exceptions are thrown as `UncheckedIOException` in this case.

try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
for(TabularWorksheetRow row : worksheet) {
// Handle row here...
}
}

#### Automatic Processing using Stream

The user an also use a Java 8 `Stream` to process rows:

try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {
worksheet.stream().forEach(c -> {
// Handle row here...
});
}

#### Automatic Processing using Consumer

Finally, if the user doesn't need to control how rows are read and processed ("pull" style), then the user can also use a consumer that walks the sheet automatically ("push" style):

Spreadsheets.processTabularWorksheet(source, new TabularWorksheetConsumer() {
public default void beginTabularWorksheet(int sheetIndex, String sheetName, List columnNames) {
// Handle setup here...
}

public default void tabularRow(int rowIndex, List cells) {
// Handle row here...
}

public default void endTabularWorksheet() {
// Handle cleanup here...
}
});

### Reading a Workbook

This approach opens a workbook file and processes each of its worksheets in order. It shares much logic with the worksheet examples above.

#### By Index

First, the user can open the file and iterate over its sheets by index:

try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(source)) {
for(int i=0;i