{"id":23366838,"url":"https://github.com/voilab/csv-parser","last_synced_at":"2025-04-07T23:16:11.217Z","repository":{"id":37546053,"uuid":"171432192","full_name":"voilab/csv-parser","owner":"voilab","description":"CSV wrapper around fgetcsv that provide per-column function, error management and type checking","archived":false,"fork":false,"pushed_at":"2023-10-16T10:29:42.000Z","size":248,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":3,"default_branch":"develop","last_synced_at":"2025-03-14T20:39:02.751Z","etag":null,"topics":["csv","csv-parser","php"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voilab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-19T08:06:47.000Z","updated_at":"2023-03-16T15:36:47.000Z","dependencies_parsed_at":"2022-08-18T02:55:39.921Z","dependency_job_id":null,"html_url":"https://github.com/voilab/csv-parser","commit_stats":null,"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voilab%2Fcsv-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voilab%2Fcsv-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voilab%2Fcsv-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voilab%2Fcsv-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voilab","download_url":"https://codeload.github.com/voilab/csv-parser/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247744329,"owners_count":20988783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-parser","php"],"created_at":"2024-12-21T14:18:51.490Z","updated_at":"2025-04-07T23:16:11.192Z","avatar_url":"https://github.com/voilab.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CSV parser\n\nThis class uses `fgetcsv` to parse a file or a string, extract columns and\nprovide per-column methods to manipulate data.\n\nIt can parse large files, HTTP streams, any types of resources, or strings.\n\nIt comes with a basic error handling, so it is possible to collect all errors\nin the CSV resource and, then, do something with this array of errors.\n\nIt is extendable, so you can parse your own type of resource/stream, if you\nhave very special needs.\n\n## Table of content\n\n+ [Install](#install)\n  + [Install PHP5 compatible version](#install-php5)\n+ [Usage](#usage)\n  + [Available methods](#available-methods)\n  + [Simple example](#simple-example)\n  + [Full example](#full-example)\n+ [Documentation](#documentation)\n  + [Options](#options)\n  + [Column function parameters](#column-function-parameters)\n  + [Headers auto-sanitization](#headers-auto-sanitization)\n  + [On before column parse function parameters](#on-before-column-parse)\n  + [On row parsed function parameters](#on-row-parsed)\n  + [Aliasing columns](#aliasing-columns)\n  + [Required columns](#required-columns)\n  + [No header](#no-header)\n  + [Shuffling columns when defining them](#shuffling-columns)\n  + [Seek in big files](#seek-in-big-files)\n  + [Close the resource](#close-the-resource)\n  + [Line endings problems](#line-endings-problems)\n+ [Error management](#error-managment)\n  + [Initialization errors](#initialization-errors)\n  + [Error and internationalization (i18n)](#error-and-internationalization)\n+ [Working with database, column optimization](#optimizers)\n  + [Parse function](#parse-function)\n  + [Reduce function](#reduce-function)\n  + [Absent function](#absent-function)\n  + [Example](#optimizer-example)\n  + [Chunks](#chunks)\n+ [Guessers : auto detect line ending, delimiter and encoding](#guessers)\n  + [Guess line ending](#guess-line-ending)\n  + [Guess delimiter](#guess-delimiter)\n  + [Guess encoding](#guess-encoding)\n+ [Known issues](#known-issues)\n+ [Testing](#testing)\n+ [Security](#security)\n+ [License](#license)\n\n## Install\n\nVia Composer\n\nCreate a composer.json file in your project root:\n``` json\n{\n    \"require\": {\n        \"voilab/csv\": \"^5.0.0\"\n    }\n}\n```\n\n``` bash\n$ composer require voilab/csv\n```\n\n### Install PHP5 compatible version \u003ca name=\"install-php5\"\u003e\u003c/a\u003e\n\nThis PHP5 version can't parse streams nor iterables.\n\n``` json\n{\n    \"require\": {\n        \"voilab/csv\": \"dev-feature/php5\"\n    }\n}\n```\n\n## Usage\n\n### Available methods \u003ca name=\"available-methods\"\u003e\u003c/a\u003e\n\n```php\n$parser = new \\voilab\\csv\\Parser($defaultOptions = []);\n\n$result = $parser-\u003efromString($str = \"A;B\\n1;test\", $options = []);\n\n// or\n$result = $parser-\u003efromFile($file = '/path/file.csv', $options = []);\n\n// or with a raw resource (fopen, fsockopen, php://memory, etc)\n$result = $parser-\u003efromResource($resource, $options = []);\n\n// or with an array or an Iterator interface\n$result = $parser-\u003efromIterable($array = [['A', 'B'], ['1', 'test']], $options = []);\n\n// or with a SPL file object\n$result = $parser-\u003efromSplFile($object = new \\SplFileObject('file.csv'), $options = []);\n\n// or with a PSR stream interface (ex. HTTP response message body)\n$response = $someHttpClient-\u003erequest('GET', '/');\n$result = $parser-\u003efromStream($response-\u003egetBody(), $options = []);\n\n// or with a custom \\voilab\\csv\\CsvInterface implementation\n$result =\u003e $parser-\u003eparse($myCsvInterface, $options = []);\n```\n\n### Simple example \u003ca name=\"simple-example\"\u003e\u003c/a\u003e\n\n```php\n$parser = new \\voilab\\csv\\Parser([\n    'delimiter' =\u003e ';',\n    'columns' =\u003e [\n        'A' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        'B' =\u003e function (string $data) {\n            return ucfirst($data);\n        }\n    ]\n]);\n\n$csv = \u003c\u003c\u003cCSV\nA; B\n4; hello\n2; world\nCSV;\n\n$result = $parser-\u003efromString($csv);\n\nforeach ($result as $row) {\n    var_dump($row['A']); // int\n    var_dump($row['B']); // string with first capital letter\n}\n```\n\n### Full example \u003ca name=\"full-example\"\u003e\u003c/a\u003e\n\n```php\n$parser-\u003efromFile('file.csv', [\n    // fgetcsv\n    'delimiter' =\u003e ',',\n    'enclosure' =\u003e '\"',\n    'escape' =\u003e '\\\\',\n    'length' =\u003e 0,\n    'autoDetectLn' =\u003e null,\n\n    // resources\n    'metadata' =\u003e [],\n    'close' =\u003e false,\n\n    // PSR stream\n    'lineEnding' =\u003e \"\\n\",\n\n    // headers management\n    'headers' =\u003e true,\n    'strict' =\u003e false,\n    'required' =\u003e ['id', 'name'],\n\n    // big files\n    'start' =\u003e 0,\n    'size' =\u003e 0,\n    'seek' =\u003e 0,\n    'chunkSize' =\u003e 0,\n\n    // data pre-manipulation\n    'autotrim' =\u003e true,\n    'onBeforeColumnParse' =\u003e function (string $data) {\n        return utf8_encode($data);\n    },\n    'guessDelimiter' =\u003e new \\voilab\\csv\\GuesserDelimiter(),\n    'guessLineEnding' =\u003e new \\voilab\\csv\\GuesserLineEnding(),\n    'guessEncoding' =\u003e new \\voilab\\csv\\GuesserEncoding(),\n\n    // data post-manipulation\n    'onRowParsed' =\u003e function (array $row) {\n        $row['other_stuff'] = do_some_stuff($row);\n        return $row;\n    },\n    'onChunkParsed' =\u003e function (array $rows) {\n        // do whatever you want, return void\n    },\n    'onError' =\u003e function (\\Exception $e, $index) {\n        throw new \\Exception($e-\u003egetMessage() . \": at line $index\");\n    }\n\n    // CSV columns definition\n    'columns' =\u003e [\n        'A as id' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        'B as firstname' =\u003e function (string $data) {\n            return ucfirst($data);\n        },\n        'C as name' =\u003e function (string $data) {\n            if (!$data) {\n                throw new \\Exception(\"Name is mandatory and is missing\");\n            }\n            return ucfirst($data);\n        },\n        // use of Optimizers (see at the end of this doc for more info)\n        'D as optimized' =\u003e new \\voilab\\csv\\Optimizer(\n            function (string $data) {\n                return (int) $data;\n            },\n            function (array $data) {\n                return some_reduce_function($data);\n            }\n        )\n    ]\n]);\n```\n\n## Documentation\n\n### Options\n\nThese are the options you can provide at constructor level or when calling\n`from*` methods. Details for `fgetcsv` options can be found here:\nhttps://php.net/fgetcsv and https://php.net/str_getcsv\n\n\n| Name | Type | Default | Description |\n|------|------|---------|-------------|\n| delimiter | `string` | `,` | `fgetcsv` the delimiter |\n| enclosure | `string` | `\"` | `fgetcsv` the enclosure string. To tell PHP there isn't enclosure, set to and empty string |\n| escape | `string` | `\\\\` | `fgetcsv` the escape string |\n| length | `int` | `0` | `fgetcsv` the line length |\n| autoDetectLn | `bool` | `null` | If supplied, set the PHP ini param `auto_detect_line_endings`. Doesn't work with PSR streams. |\n| metadata | `array` | `[]` | Resource metadata. May be used internally by the resource |\n| close | `bool` | `false` | Tells if resource must be closed after parsing is done |\n| lineEnding | `string` | `\\n` | Used with PSR streams to define what is a line ending. You must set a length, so it's possible to read a line |\n| headers | `bool` | `true` | Tells that CSV resource has the first line as headers |\n| strict | `bool` | `false` | Tells if columns defined in [columns] option must match exactly the number of columns in CSV resource |\n| required | `array` | `[]` | Columns defined in [columns] options that must be present in CSV resource (if aliased, must be the column alias) |\n| start | `int` | `0` | Line index to start with. Used in big files, in conjunction with [size] option. The first index of data is `0`, regardless of headers |\n| size | `int` | `0` | Number of lines to process. `0` ignores [start] and [size] |\n| seek | `int` | `0` | Pointer position in file, used in conjunction with [size]. Take over [start] to define the starting position |\n| autotrim | `bool` | `true` | Trim all cell content, so you have always trimmed data in you columns functions |\n| chunkSize | `int` | `0` | Number of rows to parse (including optimizer) to create a chunk |\n| onChunkParsed | `callable` | `null` | Method called when a chunk is complete |\n| onBeforeColumnParse | `callable` | `null` | Method called just before any defined column method |\n| guessDelimiter | `GuesserDelimiterInterface` | `null` | Object used to guess delimiter |\n| guessLineEnding | `GuesserLineEndingInterface` | `null` | Object used to guess line ending |\n| guessEncoding | `GuesserEncodingInterface` | `null` | Object used to guess content encoding. Call to this class is done before [onBeforeColumnParse] |\n| onRowParsed | `callable` | `null` | Method called when a row has finished parsing |\n| onError | `callable` | `null` | Method called when an error occurs, at column and at row level |\n| columns | `array` |  | CSV columns definition (see examples). This option is the only one required |\n\n### Column function parameters \u003ca name=\"column-function-parameters\"\u003e\u003c/a\u003e\n\nWhen defining a function for a column, you have access to these parameters:\n\n| Name | Type | Description |\n|------|------|-------------|\n| $data | `string` | The first argument will always be a string. It is the cell content (trimmed if `autotrim` is set to true) |\n| $index | `int` | The line index actually parsed. Correspond to the line number in the CSV resource (taken headers into account) |\n| $row | `array` | The entire row data, **raw from `fgetcsv`**. These datas **are not** the result of the columns functions |\n| $parsed | `array` | The parsed data from previous columns (columns are handled one after the other) |\n| $meta | `array` | The current column information |\n| $options | `array` | The options array |\n|------|------|-------------|\n| return | `?mixed` | Returns final cell value |\n\n```php\n$parser-\u003efromFile('file.csv', [\n    'columns' =\u003e [\n        // minimal usage\n        'col1' =\u003e function (string $data) {\n            return $data;\n        }\n    ]\n]);\n```\n\n#### Headers auto-sanitization \u003ca name=\"headers-auto-sanitization\"\u003e\u003c/a\u003e\n\nNote that headers are automatically trimmed and their carriage returns are\nremoved. Also, all spaces following a space are removed. This is only for the\nheaders. Cells content are not manipulated, except if `autotrim` is true.\n\n```\n\" a header \"     =\u003e \"a header\"\n\"a       header\" =\u003e \"a header\"\n\"a\nheader  \"        =\u003e \"a header\"\n```\n\n\u003e If the column you defined in your code doesn't exist in CSV resource **and**\n\u003e doesn't appear in `required` array, the `$meta` argument will have a flag\n\u003e `phantom` set to `true`. This is the way to know if the column exists or not\n\u003e in the CSV resource during parsing.\n\n### On before column parse function parameters \u003ca name=\"on-before-column-parse\"\u003e\u003c/a\u003e\n\nJust before any CSV column data is parsed, a standard method is called so you\ncan operate the same way on every rows and columns data. You can use that to\nmanage encoding, for example.\n\n| Name | Type | Description |\n|------|------|-------------|\n| $data | `string` | The first argument will always be a string. It is the cell content (trimmed if `autotrim` is set to true) |\n| $index | `int` | The line index actually parsed. Correspond to the line number in the CSV resource (taken headers into account) |\n| $meta | `array` | The current column information |\n| $options | `array` | The options array |\n|------|------|-------------|\n| return | `string` | Returns cell value |\n\n\u003e Be aware of type declaration in your columns functions if you want to return\n\u003e other types from here.\n\n```php\n$parser-\u003efromFile('file.csv', [\n    // minimal usage\n    'onBeforeColumnParse' =\u003e function (string $data) : string {\n        return utf8_encode($data);\n    }\n]);\n```\n\n### On row parsed function parameters \u003ca name=\"on-row-parsed\"\u003e\u003c/a\u003e\n\nWhen a row is completed, you can do something with all that data.\n\n| Name | Type | Description |\n|------|------|-------------|\n| $rowData | `array` | All the data parsed, for all the columns |\n| $index | `int` | The line index actually parsed. Correspond to the line number in the CSV resource (taken headers into account) |\n| $parsed | `array` | The parsed data from previous rows (rows are handled one after the other) |\n| $options | `array` | The options array |\n|------|------|-------------|\n| return | `array` | Returns a multidimensional `array` of `?mixed` values |\n\n```php\n$parser-\u003efromFile('file.csv', [\n    // minmal usage\n    'onRowParsed' =\u003e function (array $rowData) {\n        return $rowData;\n    }\n]);\n```\n\n### Aliasing columns \u003ca name=\"aliasing-columns\"\u003e\u003c/a\u003e\n\nYou can define aliases for columns to ease data manipulation. Just write ` as `\nto activate this functionality, like `CSV column name as alias`.\n\nAlias **must not** itself contain ` as ` string. But in the CSV resource, the\nheader can have such a string.\n\n\u003e Note that if you have ` as ` in a CSV resource header, you **must** alias it\n\u003e in the columns definitions. Otherwise, the parser will not find this column.\n\n\n```php\n$str = \u003c\u003c\u003cCSV\nA; B    ; Just as I said\n4; hello; hey\n2; world; hi\nCSV;\n\n$parser = new \\voilab\\csv\\Parser();\n\n$result = $parser-\u003efromString($str, [\n    'delimiter' =\u003e ';',\n    'columns' =\u003e [\n        'A as id' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        'B as content' =\u003e function (string $data) {\n            return ucfirst($data);\n        },\n        'Just as I said as notes' =\u003e function (string $data) {\n            return $data;\n        }\n    ]\n]);\nprint_r($result);\n\n/* prints:\nArray (\n    [0] =\u003e Array (\n        [id] =\u003e 4\n        [content] =\u003e Hello\n        [notes] =\u003e hey\n    )\n    [1] =\u003e Array (\n        [id] =\u003e 9\n        [content] =\u003e World\n        [notes] =\u003e hi\n    )\n)\n*/\n```\n\n#### Required columns \u003ca name=\"required-columns\"\u003e\u003c/a\u003e\n\nIf you have aliased a column, and it is a required column, you must use the\nalias inside the `required` option.\n\n```php\n$result = $parser-\u003efromString($str, [\n    'required' =\u003e ['id', 'content'],\n    'columns' =\u003e [\n        'A as id' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        'B as content' =\u003e function (string $data) {\n            return ucfirst($data);\n        }\n    ]\n]);\n```\n\n### No header \u003ca name=\"no-header\"\u003e\u003c/a\u003e\n\nIf you have no header in you CSV resource, you need to define the parser like\nthis.\n\n```php\n$str = \u003c\u003c\u003cCSV\n4; hello\n2; world\nCSV;\n\n$result = $parser-\u003efromString($str, [\n    'columns' =\u003e [\n        '0 as id' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        '1 as content' =\u003e function (string $data) {\n            return ucfirst($data);\n        }\n    ]\n]);\nprint_r($result);\n\n/* prints:\nArray (\n    [0] =\u003e Array (\n        [id] =\u003e 4\n        [content] =\u003e Hello\n    )\n    [1] =\u003e Array (\n        [id] =\u003e 9\n        [content] =\u003e World\n    )\n)\n*/\n```\n\n### Shuffling columns when defining them \u003ca name=\"shuffling-columns\"\u003e\u003c/a\u003e\n\nYou can define your columns in any order you want. You don't need to provide\nthem in the order they appear in the CSV. You just have to match your keys with\na header in the CSV resource.\n\n\u003e Note that the execution order of the columns are aligned with your code.\n\u003e In the example below, the function `A()` is called after `B()`, even if\n\u003e column A appears first in CSV resource.\n\n```php\n$str = \u003c\u003c\u003cCSV\nA; B\n4; hello\n2; world\nCSV;\n\n$parser = new \\voilab\\csv\\Parser();\n\n$result = $parser-\u003efromString($str, [\n    'delimiter' =\u003e ';',\n    'columns' =\u003e [\n        'B' =\u003e function (string $data) {\n            // first call\n            return ucfirst($data);\n        },\n        'A' =\u003e function (string $data) {\n            // second call\n            return (int) $data;\n        }\n    ]\n]);\nprint_r($result);\n\n/* prints:\nArray (\n    [0] =\u003e Array (\n        [B] =\u003e Hello\n        [A] =\u003e 4\n    )\n    [1] =\u003e Array (\n        [B] =\u003e World\n        [A] =\u003e 9\n    )\n)\n*/\n```\n\n### Seek in big files \u003ca name=\"seek-in-big-files\"\u003e\u003c/a\u003e\n\nYou can use the seek mechanism to accelerate parsing big files.\n\nYon _can_ specify the start index. But it is not mandatory. It is used in the\nerror managment, to know which line bugs, or in the other methods calls, where\n[$index] is given.\n\nYou are responsible for keeping [seek] and [start] snychronized. If you don't,\nand you have errors, the indexes would be irrelevant.\n\n```php\n$str = \u003c\u003c\u003cCSV\nA; B\n4; hello\n2; world\n...\nCSV;\n\n$parser = new \\voilab\\csv\\Parser();\n\n$resource = new \\voilab\\csv\\CsvString($str);\n$result = $parser-\u003eparse($resource, [\n    'delimiter' =\u003e ';',\n    'size' =\u003e 2,\n    'columns' =\u003e [\n        'B' =\u003e function (string $data) {\n            return ucfirst($data);\n        },\n        'A' =\u003e function (string $data) {\n            return (int) $data;\n        }\n    ]\n]);\n\n$lastPos = $resource-\u003etell();\n$resource-\u003eclose();\n\n$resource2 = new \\voilab\\csv\\CsvString($str);\n$nextResult = $parser-\u003eparse($resource2, [\n    'delimiter' =\u003e ';',\n    'size' =\u003e 2,\n    'start' =\u003e 2, // yon **can** specify the start index. Not mandatory.\n    'seek' =\u003e $lastPos,\n    'columns' =\u003e [\n        'B' =\u003e function (string $data) {\n            return ucfirst($data);\n        },\n        'A' =\u003e function (string $data) {\n            return (int) $data;\n        }\n    ]\n]);\n```\n\n### Close the resource \u003ca name=\"close-the-resource\"\u003e\u003c/a\u003e\n\nUsing `fromString()` and `fromFile()` methods, the resource will be closed\nautomatically. With other `from*()` methods, you can close the resource by\ngiving the `'close' =\u003e true` option.\n\n### Line endings problems \u003ca name=\"line-endings-problems\"\u003e\u003c/a\u003e\n\nJust as stated in official documentation, if you have problems with recognition\nin line endings, you can use the option below to activate auto detect.\n\n`$parser-\u003eparse($resource, [ 'autoDetectLn' =\u003e true ]);`\n\n\u003e Note that auto detect PHP ini param is not reseted to initial value after the\n\u003e parsing has finished.\n\nWhen parsing streams (like HTTP response message body), line ending must be\nspecified in the array options.\n\n## Error management \u003ca name=\"error-managment\"\u003e\u003c/a\u003e\n\nYou can use the `onError` option to collect all errors, so you can give a\nmessage to the user with all errors in the file you found, in one shot.\n\nYou can stop the process of a row by checking the `$meta` argument. It has a\nkey `type` which can be `row` or `column`. If it's `column`, you can throw the\nerror and it will call `onError` again, but with type `row`. Other columns will\nbe skipped for this row.\n\nIf you use an optimizer, you can call an Exception from there too. The key\n`type` will then have the value `optimizer`.\n\n```php\n$errors = [];\n$data = $parser-\u003efromFile('file.csv', [\n    'onError' =\u003e function (\\Exception $e, $index, array $meta, array $options) use (\u0026$errors) {\n        $errors[] = \"Line [$index]: \" . $e-\u003egetMessage();\n        // do nothing more, so next columns and next lines can be parsed too.\n        // meta types are the following:\n        switch ($meta['type']) {\n            case 'init':\n            case 'column':\n            case 'row':\n            case 'reducer':\n            case 'optimizer':\n            case 'chunk':\n        }\n    },\n    'columns' =\u003e [\n        'email' =\u003e function (string $data) {\n            // accept null email but validate it if there's one\n            if ($data \u0026\u0026 !filter_var($data, FILTER_VALIDATE_EMAIL)) {\n                throw new \\Exception(\"The email [$data] is invalid\");\n            }\n            return $data ?: null;\n        }\n    ]\n]);\nif (count($errors)) {\n    // now print in some ways all the errors found\n    print_r($errors);\n} else {\n    // everything went well, put data in db on whatever\n}\n```\n\n### Initialization errors \u003ca name=\"initialization-errors\"\u003e\u003c/a\u003e\n\nSome errors are thrown before any line is parsed. You have to take this into\naccount.\n\n```php\n$data = $parser-\u003efromFile('file.csv', [\n    'onError' =\u003e function (\\Exception $e, $index, array $meta) {\n        if ($meta['type'] === 'init') {\n            // called during initialization.\n            var_dump($meta['key']); // for errors with specific key\n            if ($e-\u003egetCode() === \\voilab\\csv\\Exception::HEADERMISSING) {\n                throw new \\Exception(sprintf(\"La colonne [%s] est obligatoire\", $meta['key']));\n            }\n        }\n        throw $e;\n    }\n]);\n```\n\n### Error and internationalization (i18n) \u003ca name=\"error-and-internationalization\"\u003e\u003c/a\u003e\n\nIf you want to translate error messages, you can use the `onError` function\nwith `meta['type'] === 'init'` to throw the translated message.\n\n## Working with database, column optimization \u003ca name=\"optimizers\"\u003e\u003c/a\u003e\n\nWhen parsing large set of data, if one column is, for example, a user ID, it's\na bad idea to call a `find($id)` method for each CSV row iteration. It's better\nto take all column values, and call for a `findByIds($ids)`.\n\nThe build-in class `Optimizer` allows you to define a column this way. It takes\nthree arguments. The first is the function needed to parse value from CSV.\nThe second is a reduce function. It recieves all data of the column, and must\nreturn an indexed array.\n\nFor example, if you have 2 rows with values `a` and `b`, the indexed result of\nthe reduce function would be `Array ( a =\u003e something, b =\u003e something else )`.\n\nThe third argument is a function called when a value is not found in the reduced\nfunction.\n\n### Parse function \u003ca name=\"parse-function\"\u003e\u003c/a\u003e\n\nSame as Column function (see above)\n\n### Reduce function \u003ca name=\"reduce-function\"\u003e\u003c/a\u003e\n\n| Name | Type | Description |\n|------|------|-------------|\n| $data | `array` | All the data parsed, for the column |\n| $parsed | `array` | The parsed data (complete set of data) |\n| $optimized | `array` | Columns already optimized. Key =\u003e value pair, where key is column name and value is the reduced function result of the column |\n| $meta | `array` | The current column information |\n| $options | `array` | The options array |\n|------|------|-------------|\n| return | `array` | Returns an indexed array |\n\n\u003e Returns an indexed array. If there's no correspondance between CSV column\n\u003e values and the result of the reduce function, you should not return the\n\u003e missing value.\n\u003e For example, if values are [10, 22], they are used in database query to find\n\u003e users by id, and user ID 22 doesn't exist, the result should be\n\u003e `Array ( 10 =\u003e User(id=10) )`\n\n### Absent function \u003ca name=\"absent-function\"\u003e\u003c/a\u003e\n\nWhen a value is not found in the reduced result, the default behaviour is to\nset the value (like there wasn't any reduce function for this row). You can\noverride this by defining the absent function, and do what you want with the\nvalue.\n\n| Name | Type | Description |\n|------|------|-------------|\n| $value | `mixed` | The data parsed for the column, for this row |\n| $index | `int` | The line index actually parsed. Correspond to the line number in the CSV resource (taken headers into account) |\n| $parsed | `array` | The parsed data of this row |\n| $optimized | `array` | Columns already optimized. Key =\u003e value pair, where key is column name and value is the reduced function result of the column |\n| $meta | `array` | The current column information |\n| $options | `array` | The options array |\n|------|------|-------------|\n| return | `?mixed` | Returns the default value for this \"not found\" key |\n\n\u003e If you have defined an error function, it will be called with a type of\n\u003e `optimizer` (check error management above) if you throw an error from here.\n\n### Example \u003ca name=\"optimizer-example\"\u003e\u003c/a\u003e\n\n```php\n$str = \u003c\u003c\u003cCSV\nA; B\n4; updated John\n2; updated Sybille\nCSV;\n\n$database = some_database_abstraction();\n\n$data = $parser-\u003efromString($str, [\n    'delimiter' =\u003e ';',\n    'columns' =\u003e [\n        'A as user' =\u003e new \\voilab\\csv\\Optimizer(\n            // column function, same as when there's no optimizer\n            function (string $data) {\n                return (int) $data;\n            },\n            // reduce function that uses the set of datas from the 1st function\n            function (array $data) use ($database) {\n                $query = 'SELECT id, firstname FROM user WHERE id IN(?)';\n                $users = $database-\u003equery($query, array_unique($data));\n                return array_reduce($users, function ($acc, $user) {\n                    $acc[$user-\u003eid] = $user;\n                    return $acc;\n                }, []);\n            },\n            // absent function. data is [int] because the first function returns\n            // an [int]\n            function (int $data, int $index) {\n                throw new \\Exception(\"User with id $data at index $index doesn't exist!\");\n            }\n        ),\n        'B as firstname' =\u003e function (string $data) {\n            return $data;\n        }\n    ]\n]);\nprint_r($result);\n\n/* prints:\nArray (\n    [0] =\u003e Array (\n        [user] =\u003e User ( id =\u003e 4, firstname =\u003e John )\n        [firstname] =\u003e updated John\n    )\n    [1] =\u003e Array (\n        [user] =\u003e User ( id =\u003e 2, firstname =\u003e Sybille )\n        [firstname] =\u003e updated Sybille\n    )\n)\n*/\n```\n\n### Chunks\n\nOptimizers are good in certain cases, but sometimes you want to parse your\ndata by chunk, maniuplate it, store it, and do it again with the next chunk.\nYou can achieve this with chunks options:\n\n```php\n$str = ''; // a hudge CSV string with tons of rows and two columns\n\n$parser-\u003efromString($str, [\n    'delimiter' =\u003e ';',\n    'chunkSize' =\u003e 500,\n    'onChunkParsed' =\u003e function (array $rows, int $chunkIndex, array $columns, array $options) {\n        // count($rows) = 500\n        // do something with your parsed rows. This method will be called\n        // as long as there are rows to parse.\n\n        // This method returns void\n    },\n    'onError' =\u003e function (\\Exception $e, $index, array $meta) {\n        // if ($meta['type] === 'chunk') { do something }\n    },\n    'columns' =\u003e [\n        'A as name' =\u003e function (string $data) {\n            return (int) $data;\n        },\n        'B as firstname' =\u003e function (string $data) {\n            return $data;\n        }\n    ]\n]);\n\n```\n\u003e If you use optimizers, `$rows` will be the resultset optimized.\n\n\u003e You don't need to use the array returned by `fromString` (or alike)\n\u003e because what you did in `onChunkParsed` is enough.\n\n## Guessers : auto detect line ending, delimiter and encoding \u003ca name=\"guessers\"\u003e\u003c/a\u003e\n\nGuessing how CSV data is structured (line ending, delimiter or encoding) is\na very hasardous task, with so many use-cases it's impossible to rule them\nall..\n\nThis package still provides a way to guess these elements, but if it\ndoesn't fit your needs, you can easily extend or create a new class and\nmanage your specific use-case.\n\nIf you want to implement guessing your own way, please read the code\nbase for each guessing interfaces.\n\n\u003e Guessing is useless with some CsvInteface implementations. For example,\n\u003e iterables are ignored, since data is already arranged in cells and\n\u003e rows. Be sure it's useful for you before you use guessing features.\n\n### Guess line ending \u003ca name=\"guess-line-ending\"\u003e\u003c/a\u003e\n\nFirst thing the parser does is to detect which are the line endings. The\nprovided implementation tries to detect line endings among `\\n`, `\\r` and `\\r\\n`.\n\n```php\n$str = 'A;B\\r\\n4;Hello\\r\\n;2;World';\n\n$parser-\u003efromString($str, [\n    'guessLineEnding' =\u003e new \\voilab\\csv\\GuesserLineEnding([\n        // maximum line length to read, which will be parsed\n        // defaults to: see below\n        'length' =\u003e 1024 * 1024\n    ])\n]);\n```\n\n### Guess delimiter \u003ca name=\"guess-delimiter\"\u003e\u003c/a\u003e\n\nThen, the parser tries to detect delimiter. In the provided implementation, an\nexception is thrown if delimiter is not found **or if it's too ambiguous**.\n\n```php\n$str = 'A;B\\n4;Hello\\n;2;World';\n\n$parser-\u003efromString($str, [\n    'guessDelimiter' =\u003e new \\voilab\\csv\\GuesserDelimiter([\n        // delimiters to check. Defaults to: see below\n        'delimiters' =\u003e [',', ';', ':', \"\\t\", '|', ' '],\n        // number of lines to check. Defaults to: see below\n        'size' =\u003e 10,\n        // throws an exception if result is amiguous. Defaults to: see below\n        'throwAmbiguous' =\u003e true,\n        // score to reach for a delimiter. Defaults to: see below\n        'scoreLimit' =\u003e 50\n    ])\n]);\n```\n\n### Guess encoding \u003ca name=\"guess-encoding\"\u003e\u003c/a\u003e\n\nFor each cell, encoding auto detection is called. The provided implementation\ntries to find the current cell encoding, and encode it to the other one given\nin the constructor.\n\nIt is also called for the header row. If you want to encode differently\nbetween headers and datas, you can check on `$meta['type'] === 'init'` in\nyour `encode` function (check code base).\n\n\u003e This guesser is called BEFORE onBeforeColumnParse\n\n```php\n$str = 'A;B\\n4;Hellö\\n;2;Wörld';\n\n$parser-\u003efromString($str, [\n    'guessEncoding' =\u003e new \\voilab\\csv\\GuesserEncoding([\n        // encoding in which data to retrieve. Defaults to: see below\n        'encodingTo' =\u003e 'utf-8',\n        // encoding in file. If null, is auto-detected\n        'from' =\u003e null,\n        // available encodings. If null, uses mb_list_encodings\n        'encodings' =\u003e null,\n        // strict mode for mb_detect_encoding. Defaults to: see below\n        'strict' =\u003e false\n    ])\n]);\n```\n## Known issues \u003ca name=\"known-issues\"\u003e\u003c/a\u003e\n\n+ with PSR streams, carriage returns are not supported in headers and in cells\ncontent\n+ guessing processes are unlikely to fit your specific needs immediately. Before\ncreating an issue or a PR, try to extends the guess classes and make your own\nspecific adaptations\n\n## Testing\n```\n$ /vendor/bin/phpunit\n```\n## Security\n\nIf you discover any security related issues, please use the issue tracker.\n\n## License\n\nThe MIT License (MIT). Please see License File for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoilab%2Fcsv-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoilab%2Fcsv-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoilab%2Fcsv-parser/lists"}