https://github.com/osiegmar/javacsvcomparison
Comparison of parsing non-standard CSV data using different Java libraries.
https://github.com/osiegmar/javacsvcomparison
Last synced: 8 months ago
JSON representation
Comparison of parsing non-standard CSV data using different Java libraries.
- Host: GitHub
- URL: https://github.com/osiegmar/javacsvcomparison
- Owner: osiegmar
- License: mit
- Created: 2021-01-03T15:20:36.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2025-10-03T08:39:24.000Z (8 months ago)
- Last Synced: 2025-10-03T10:12:21.338Z (8 months ago)
- Language: Java
- Homepage:
- Size: 252 KB
- Stars: 5
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Interpretation of the CSV Standard by Java Libraries
Unfortunately, there is no real industry standard for CSV files.
The closest thing we do currently have (since 2005) is [RFC 4180](https://tools.ietf.org/html/rfc4180).
Working with non-standardized data often comes with surprises, but what exactly happens when parsing
CSV data that doesn't match this RFC?
This project is about to find that out...
> [!NOTE]
> This project was created during the development of [FastCSV].
> Since this comparison uses the result of FastCSV as a reference value (expected result),
> the comparison is highly biased.
## Test results
The [test suite](tests) consists of currently 60 tests in total.
To pass a test, the library must return the expected result or throw an exception if the input is invalid.
If the library lacks a feature (like skipping empty lines or comments), the test is marked as skipped.
If a test is neither passed nor skipped, it is marked as failed.
| Library | Checks passed | Checks skipped | Checks failed |
|:-------------------|--------------:|-------------------:|-------------------:|
| [Commons CSV] | 48 | :heavy_minus_sign: | 12 |
| [CSVeed] | 28 | 7 | 25 |
| [FastCSV] | 60 | :heavy_minus_sign: | :heavy_minus_sign: |
| [Jackson CSV] | 49 | 7 | 4 |
| [Java CSV] | 51 | 7 | 2 |
| [Opencsv] | 38 | 17 | 5 |
| [picocsv] | 60 | :heavy_minus_sign: | :heavy_minus_sign: |
| [sesseltjonna-csv] | 30 | 17 | 13 |
| [SimpleFlatMapper] | 41 | 17 | 2 |
| [Super CSV] | 49 | 7 | 4 |
| [Univocity] | 51 | 7 | 2 |
A detailed report can be found at: https://osiegmar.github.io/JavaCsvComparison
The tests clearly show that there are significant differences between the libraries –
especially when it comes to non-standardized data.
## Execute
To run the tests and generate the report locally, execute the following command:
```shell
./gradlew test allureAggregateServe
```
## Library Implementor?
The tests are written in a way that they can be easily adapted to other libraries and languages.
They can also be used as unit tests for your library, like done in [FastCSV].
Feel free to use the tests for your own project.
[Commons CSV]: https://commons.apache.org/proper/commons-csv/
[CSVeed]: https://42bv.github.io/CSVeed/
[FastCSV]: https://fastcsv.org
[Jackson CSV]: https://github.com/FasterXML/jackson-dataformats-text
[Java CSV]: https://sourceforge.net/projects/javacsv/
[Opencsv]: https://opencsv.sourceforge.net
[picocsv]: https://github.com/nbbrd/picocsv
[sesseltjonna-csv]: https://github.com/skjolber/sesseltjonna-csv
[SimpleFlatMapper]: https://simpleflatmapper.org
[Super CSV]: https://super-csv.github.io/super-csv/index.html
[Univocity]: https://github.com/uniVocity/univocity-parsers