{"id":16689797,"url":"https://github.com/bennidi/blox","last_synced_at":"2025-08-04T08:36:04.000Z","repository":{"id":14618450,"uuid":"17335771","full_name":"bennidi/blox","owner":"bennidi","description":"Event-based CSV parsing. Supports multiple data blocks with different formats in same file","archived":false,"fork":false,"pushed_at":"2018-03-09T15:06:38.000Z","size":293,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-20T06:38:48.422Z","etag":null,"topics":["csv-files","csv-parser","csv-reader","java"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bennidi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-03-02T10:47:49.000Z","updated_at":"2018-03-09T14:25:57.000Z","dependencies_parsed_at":"2022-08-28T21:00:44.499Z","dependency_job_id":null,"html_url":"https://github.com/bennidi/blox","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bennidi/blox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bennidi%2Fblox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bennidi%2Fblox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bennidi%2Fblox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bennidi%2Fblox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bennidi","download_url":"https://codeload.github.com/bennidi/blox/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bennidi%2Fblox/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268671397,"owners_count":24288172,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-04T02:00:09.867Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv-files","csv-parser","csv-reader","java"],"created_at":"2024-10-12T15:49:19.890Z","updated_at":"2025-08-04T08:36:03.951Z","avatar_url":"https://github.com/bennidi.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"BloX\n=========\n\nBloX is an event based csv parsing library with support for processing of **simple csv files** as well as multi-format files containing multiple blocks and each block adhering to a different format, e.g. different number of columns, different value separators etc.\n\nIt comes wit a declarative API for creating block descriptors. Event handlers can be used to create in-memory models of the incoming csv data or do any other kind of processing.\n\nThe event based approach to parsing the csv files offers high performance and a constant memory foot print.\n\nWith custom event handlers it is possible to implement any processing logic like value conversions, aggregations or data histograms.\n\nCheck out the [javadoc](http://bennidi.github.io/blox/)\n\n\n \u003ch2\u003eUsage\u003c/h2\u003e\n\nCreating a block definitions is very simple. The only mandatory configuration data for a block\nare its boundaries (i.e. beginning and end). A static block defines its start and end point in terms\nof line numbers.\n\n\n```java\n// this block will contain data from line 19 until line 29\nnew CsvBlockDescriptor()\n   .starts().after().line(18)\n   .ends().with().line(30);\n```\n    \n\nIn many scenarios the length of the csv data (i.e. the number of lines of the data part) are not known in advance.\nFor this scenario a dynamic block can be used. A dynamic block uses regular expressions to detect the start and end\nof a block.\n\n```java\n// this block will contain data from the first line after the line that contains the specified pattern\n// until a blank line is reached\nnew CsvBlockDescriptor()\n    .starts().after().pattern(\"Transactions.*\")\n    .ends().with().emptyLine();\n```\n\nIf the real data starts more than one line after the pattern (many csv exports contain header information and comments)\nthen a header size may be specified. This header information will not be processed as part of the block data. Instead\nit will be copied as is.\n\n\n```java\n    // this block will contain data from the first line after the line that contains the specified pattern\n    // until a blank line is reached\n    // the first three lines after the blocks start contain header information and will not be processed\n    new CsvBlockDescriptor()\n        .starts().after().pattern(\"Transactions.*\")\n        .ends().with().emptyLine()\n        .headerSize(3);\n```\n\nCreating a reader for a set of block definitions and an input stream is straight forward.\n\n```java\n// create the event handlers and pass the block configuration\nCsvBlockBuilder block1 = new CsvBlockBuilder(new CsvBlockDescriptor()\n        .starts().after().pattern(\"Parameter.*\")\n        .ends().with().emptyLine()\n        .hasColumnNames(true));\nCsvBlockBuilder block2 = new CsvBlockBuilder(new CsvBlockDescriptor()\n        .starts().after().pattern(\"Daten.*\")\n        .ends().with().emptyLine()\n        .hasColumnNames(true));\n\n// create a reader for the given block builders (a block builder is mainly a set of handlers\n// that will produce an in-memory model of the parsed csv data)\nBloxReader utf8Reader = Utf8Reader.createReaderFor(block1, block2);\n// start reading\nutf8Reader.read(new FileInputStream(new File(\"/path/to/file.csv\"));\n\n// access the parsed blocks and to whatever needs to be done\nblock1.getBlock().getEntries() ....\n```\n\n\nBloX also provides a class for writing csv blocks to an output stream. Simply provide it with the stream and a set of\nblocks.\n\n```java\n\n// get some block definitions and read the input\nCsvBlockBuilder[] blockBuilders = CsvBlockBuilder.fromDescriptors(definitions);\nBloxReader reader = new BloxReader(ICsvParserFactory.Default, blockBuilders, encoding);\nreader.read(input);\n\nWriter output = new FileWriter(\"/path/to/output.csv\");\nMultiBlockWriter blockWriter = new MultiBlockWriter(output);\nblockWriter.writeBlocks(CsvBlockBuilder.getBlocks(blockBuilders));\nblockWriter.close();\n```\n\nBloX also provides means to compare two csv documents. A Comparator can be configured to different levels of equality,\ne.g. it might enforce line and column order or be less restrictive if for example different line order does not matter\nas long as every line is found.\n\n\n```java\nCsvBlockDescriptor blockDefinition = new CsvBlockDescriptor()\n        .starts().with().pattern(\"Daten.*\")\n        .ends().with().pattern(CsvFileFormat.EmptyLine)\n        .headerSize(1)\n        .hasColumnNames(true);\n\n//comparison will ignore different block,line and column order by default\nCsvComparator comparator = new CsvComparator();\nList\u003cDifference\u003e differences = comparator.compare(\n        getTestResource(Testfiles.Comparison.SingleBlockControl),\n        getTestResource(Testfiles.Comparison.SingleBlock),\n        blockDefinition);\n```\n\n\n\u003ch2\u003eContribute\u003c/h2\u003e\n\nOne area that needs more attention is the handling of different file formats and character encodings. Test coverage\nis still too low\n\n\u003ch2\u003eLicense\u003c/h2\u003e\n\nThis project is distributed under the terms of the MIT License. See file \"LICENSE\" for further reference.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbennidi%2Fblox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbennidi%2Fblox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbennidi%2Fblox/lists"}