{"id":15044811,"url":"https://github.com/sigpwned/tabular4j","last_synced_at":"2025-04-10T00:42:56.854Z","repository":{"id":65507911,"uuid":"584617388","full_name":"sigpwned/tabular4j","owner":"sigpwned","description":"A library for reading and writing pan-format tabular data, especially spreadsheets, in Java 11+.","archived":false,"fork":false,"pushed_at":"2025-04-01T14:58:45.000Z","size":362,"stargazers_count":1,"open_issues_count":8,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T00:42:51.952Z","etag":null,"topics":["csv","excel","java","java-11","spreadsheets","tabular-data","tsv","xls","xlsx"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sigpwned.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-03T04:21:17.000Z","updated_at":"2025-03-02T02:15:54.000Z","dependencies_parsed_at":"2024-09-14T08:21:43.954Z","dependency_job_id":"1dcd8e26-46f7-484e-abc0-9b87fca9414c","html_url":"https://github.com/sigpwned/tabular4j","commit_stats":{"total_commits":47,"total_committers":1,"mean_commits":47.0,"dds":0.0,"last_synced_commit":"ed99341c142f95e9b16a158454ed59d666202a0f"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigpwned%2Ftabular4j","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigpwned%2Ftabular4j/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigpwned%2Ftabular4j/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sigpwned%2Ftabular4j/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sigpwned","download_url":"https://codeload.github.com/sigpwned/tabular4j/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137998,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","excel","java","java-11","spreadsheets","tabular-data","tsv","xls","xlsx"],"created_at":"2024-09-24T20:51:04.596Z","updated_at":"2025-04-10T00:42:56.815Z","avatar_url":"https://github.com/sigpwned.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TABULAR4J [![tests](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml/badge.svg)](https://github.com/sigpwned/tabular4j/actions/workflows/tests.yml) [![Maven Central Version](https://img.shields.io/maven-central/v/com.sigpwned/tabular4j)](https://search.maven.org/artifact/com.sigpwned/tabular4j)\n\nA framework for reading and writing tabular data using popular data formats, especially spreadsheets.\n\n## Motivation\n\nTabular data is an important interface between humans and machine. However, supporting multiple spreadsheet formats transparently is complex. This library provides a framework of adapters and facades for reading and writing tabular data using popular spreadsheet data formats.\n\n## Goals\n\n* To support the straightforward reading and writing of tabular data in common file formats, especially spreadsheets.\n* To support common value types for cells to streamline development and minimize errors.\n\n## Non-Goals\n\n* To support non-tabular layouts of data, e.g. merged cells in Excel.\n* To expose all features for all data formats, e.g. charts in Excel. Outstanding specialized libraries already exist for the use and manipulation of specific data formats.\n* To support all data formats that can represent tabular data. The library's primary goal is to support the data formats most commonly exchanged between humans and computers.\n\n## Installing\n\n### Maven Dependencies\n\nYou can get the library from Maven central:\n\n    \u003c!-- To add support for CSV and TSV files --\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003ecom.sigpwned\u003c/groupId\u003e\n        \u003cartifactId\u003etabular4j-csv\u003c/artifactId\u003e\n        \u003cversion\u003e0.0.0-b2\u003c/version\u003e\n    \u003c/dependency\u003e\n\n    \u003c!-- To add support for Excel XLS and XLSX files --\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003ecom.sigpwned\u003c/groupId\u003e\n        \u003cartifactId\u003etabular4j-excel\u003c/artifactId\u003e\n        \u003cversion\u003e0.0.0-b2\u003c/version\u003e\n    \u003c/dependency\u003e\n\n### Maven Build\n\nThe tabular4j library uses the [ServiceLoader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html) to discover supported file formats automatically at runtime. This requires each JAR file to include a special `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` file that lists file format factory classes the JAR provides. To ensure that your application gets *all* of the supported file formats, as opposed to just the supported file formats from the last JAR added to your build, use the `maven-shade-plugin` to merge the `META-INF/services/com.sigpwned.tabular4j.SpreasheetFormatFactory` files from all of the JARs in your build: \n\n    \u003cbuild\u003e\n        \u003cplugins\u003e\n            \u003cplugin\u003e\n                \u003cgroupId\u003eorg.apache.maven.plugins\u003c/groupId\u003e\n                \u003cartifactId\u003emaven-shade-plugin\u003c/artifactId\u003e\n                \u003cversion\u003e3.6.0\u003c/version\u003e \u003c!-- use current version --\u003e\n                \u003cexecutions\u003e\n                    \u003cexecution\u003e\n                        \u003cphase\u003epackage\u003c/phase\u003e\n                        \u003cgoals\u003e\n                            \u003cgoal\u003eshade\u003c/goal\u003e\n                        \u003c/goals\u003e\n                        \u003cconfiguration\u003e\n                            \u003ctransformers\u003e\n                                \u003ctransformer implementation=\"org.apache.maven.plugins.shade.resource.ServicesResourceTransformer\"/\u003e\n                            \u003c/transformers\u003e\n                        \u003c/configuration\u003e\n                    \u003c/execution\u003e\n                \u003c/executions\u003e\n            \u003c/plugin\u003e\n        \u003c/plugins\u003e\n    \u003c/build\u003e\n\n## Data Model\n\n### Worksheets\n\nA worksheet is a single spreadsheet containing one rectangular table of data. The first row of a worksheet is interpreted as column names (i.e., headers), and determines the width of the table of data. Each addditional row of a worksheet is interpreted as a value in each of the named columns. If a row's actual width is greater than the number of columns, then it is truncated before it is interpreted; if its actual width is less than the number of the columns, then it is right-padded with null-valued cells before it is interpreted.\n\n    +---------+---------+---------+---------+---------+\n    |  alpha  |  bravo  | charlie |  delta  |         | \u003c-- Column Names/Headers. Width = 4.\n    +---------+---------+---------+---------+---------+\n    |    a    |    b    |    c    |    d    |         | \u003c-- Width 4. No transformation.\n    +---------+---------+---------+---------+---------+\n    |    a    |    b    |    c    |         |         | \u003c-- Width 3. Will be right-padded with null.\n    +---------+---------+---------+---------+---------+\n    |         |    b    |    c    |    d    |         | \u003c-- Width 4. First cell is empty.\n    +---------+---------+---------+---------+---------+\n    |    a    |    b    |    c    |    d    |    e    | \u003c-- Width 5. Will be truncated to width 4.\n    +---------+---------+---------+---------+---------+\n    |    a    |    b    |    c    |    d    |         | \u003c-- Width 4. No transformation.\n    +---------+---------+---------+---------+---------+\n    \n### Workbooks\n\nA workbook is an ordered list of spreadsheets gathered into a single file. Each worksheet in a workbook has a unique name. One sheet in the workbook is \"active\". For a workbook created by a human, this is the worksheet that was last edited by the user and/or is shown when the file is opened; for a workbook created by the library, it is simply the worksheet with the active flag set.\n\n## Code Examples\n\n### Opening a File to Read\n\nUsers open an existing workbook file to read using a `ByteSource`. Note that the spreadsheet format (e.g., CSV, XLSX, etc.) is never given explicitly; rather, the framework detects the format of the spreadsheet and reacts accordingly.\n\n    try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(ByteSource.ofFile(file))) {\n        // Handle workbook here...\n    }\n\n### Reading a Worksheet\n\nThis approach opens a workbook file and processes its active worksheet only.\n\n#### Manual Iteration \n\nFirst, the user can open it and then iterate over its rows manually:\n\n    try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {\n        for(TabularWorksheetRow row=worksheet.readRow();row!=null;row=worksheet.readRow()) {\n            // Handle row here...\n        }\n    }\n    \n#### Automatic Iteration using Iterator\n\nAlternatively, the user can use an `Iterator` directly. Note that any `IOException` exceptions are thrown as `UncheckedIOException` in this case.\n\n    try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {\n        for(TabularWorksheetRow row : worksheet) {\n            // Handle row here...\n        }\n    }\n\n#### Automatic Processing using Stream\n\nThe user an also use a Java 8 `Stream` to process rows:\n\n    try (TabularWorksheetReader worksheet=SpreadsheetFactory.getInstance().readTabularActiveWorksheet(source)) {\n        worksheet.stream().forEach(c -\u003e {\n            // Handle row here...\n        });\n    }\n\n#### Automatic Processing using Consumer\n\nFinally, if the user doesn't need to control how rows are read and processed (\"pull\" style), then the user can also use a consumer that walks the sheet automatically (\"push\" style):\n\n    Spreadsheets.processTabularWorksheet(source, new TabularWorksheetConsumer() {\n        public default void beginTabularWorksheet(int sheetIndex, String sheetName, List\u003cString\u003e columnNames) {\n            // Handle setup here...\n        }\n\n        public default void tabularRow(int rowIndex, List\u003cTabularWorksheetCell\u003e cells) {\n            // Handle row here...\n        }\n\n        public default void endTabularWorksheet() {\n            // Handle cleanup here...\n        }\n    });\n    \n### Reading a Workbook\n\nThis approach opens a workbook file and processes each of its worksheets in order. It shares much logic with the worksheet examples above.\n\n#### By Index\n\nFirst, the user can open the file and iterate over its sheets by index:\n\n    try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(source)) {\n        for(int i=0;i\u003cworkbook.getWorksheetCount();i++) {\n            try (TabularWorksheetReader worksheet=workbook.getWorksheet(i)) {\n                // Handle worksheet here...\n            }\n        }\n    }\n    \n#### By Name\n\nAlternatively, the user can open the file and iterate over its sheets by name:\n\n    try (TabularWorkbookReader workbook=SpreadsheetFactory.getInstance().readTabularWorkbook(source)) {\n        for(String worksheetName : workbook.getWorksheetNames()) {\n            try (TabularWorksheetReader worksheet=workbook.findWorksheetByName(worksheetName).get()) {\n                // Handle worksheet here...\n            }\n        }\n    }\n\n### Opening a File to Write\n\nUsers open a workbook file to write using a `ByteSink`. The user gives the desired file format in this case.\n\n    try (TabularWorkbookWriter workbook=SpreadsheetFactory.getInstance().writeTabularWorkbook(ByteSink.ofFile(file))) {\n        // Handle workbook here...\n    }\n\n### Writing a Worksheet\n\nIn this approach, the user simply opens writer, writes the rows, and closes the writer. This results in a workbook with one sheet.\n\n    try (TabularWorksheetRowWriter worksheet=SpreadsheetFactory.getInstance().writeTabularActiveWorksheet(sink, \"csv\")\n            .writeHeaders(\"alpha\", \"bravo\")) {\n        worksheet.writeValuesRow(\"a\", \"b\");\n        worksheet.writeValuesRow(\"1\", \"2\");\n    }\n\n### Writing a Workbook\n\nIn this approach, the user simply opens writer, writes the rows, and closes the writer. The user may then open and write additional sheets the same way.\n\n    try (TabularWorkbookWriter workbook=SpreadsheetFactory.getInstance().writeTabularWorkbook(sink, \"csv\")) {\n        try (WorksheetWriter worksheet=workbook.getWorksheet(\"sheet1\").writeHeaders(\"alpha\", \"bravo\")) {\n            // Write worksheet...\n        }\n        try (TabularWorksheetRowWriter worksheet=workbook.getWorksheet(\"sheet2\").writeHeaders(\"charlie\", \"delta\")) {\n            // Write worksheet...\n        }\n    }\n    \n## FAQ\n\n### What file formats are supported?\n\nOut of the box, the library supports the following file formats via the following modules:\n\n* `tabular4j-excel` -- Using the excellent [Apache POI](https://poi.apache.org/) library\n    * `xlsx`\n    * `xls`\n* `tabular4j-csv` -- Using the [`csv4j`](https://github.com/sigpwned/csv4j) library\n    * `csv`\n    * `tsv`\n\n### What file formats do you plan to support in the future?\n\nIf there is interest, I'd like to support formats like the following:\n\n* `jsonl` -- One JSON object per line\n* `orc` -- [Apache ORC](https://orc.apache.org/) Tabular data store for Hadoop\n* `parquet` -- [Apache Parquet](https://parquet.apache.org/) column-oriented data file\n\n### What value types does the library support?\n\nOut of the box, the library supports the following data types:\n\n* All Java primitives (`byte`, `short`, `int`, `long`, `boolean`, `float`, `double`, `char`)\n* All Java boxed types (`Byte`, `Short`, `Integer`, `Long`, `Boolean`, `Float`, `Double`, `Character`)\n* Java 8 Time types (`Instant`, `LocalDate`, `LocalTime`, `LocalDateTime`, `OffsetDateTime`, `ZonedDateTime`, `ZoneId`)\n* Internet types (`URL`, `URI`, `InetAddress`)\n* Various and sundry others (`UUID`, `BigDecimal`, `BigInteger`, `String`, `byte[]`, `Date`, `Calendar`)\n\n### Can I add my own value types?\n\nOne goal of this library is to make it easier to add new data types for processing. For now, to add new types, look at the `CoreCsvValueMapperFactory` and `CoreExcelValueMapperFactory` classes. I hope to add an easier way to add types soon.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigpwned%2Ftabular4j","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsigpwned%2Ftabular4j","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsigpwned%2Ftabular4j/lists"}