{"id":26255422,"url":"https://github.com/doeixd/csv-utils","last_synced_at":"2025-09-10T02:42:01.367Z","repository":{"id":279654086,"uuid":"939540874","full_name":"doeixd/csv-utils","owner":"doeixd","description":"Helpful utils for working with csv files or arrays of objects","archived":false,"fork":false,"pushed_at":"2025-07-22T11:18:34.000Z","size":409,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-05T20:42:37.437Z","etag":null,"topics":["csv","csv-files","csv-parsing","typescript-library"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/doeixd.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-26T17:50:26.000Z","updated_at":"2025-09-02T08:00:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"11490124-8622-41fa-924d-367d8744dd80","html_url":"https://github.com/doeixd/csv-utils","commit_stats":null,"previous_names":["doeixd/csv-utils"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/doeixd/csv-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doeixd%2Fcsv-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doeixd%2Fcsv-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doeixd%2Fcsv-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doeixd%2Fcsv-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/doeixd","download_url":"https://codeload.github.com/doeixd/csv-utils/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/doeixd%2Fcsv-utils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273851713,"owners_count":25179488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-06T02:00:13.247Z","response_time":2576,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-files","csv-parsing","typescript-library"],"created_at":"2025-03-13T19:18:43.433Z","updated_at":"2025-09-10T02:42:01.349Z","avatar_url":"https://github.com/doeixd.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CSV Utils\n\n[![npm version](https://img.shields.io/npm/v/@doeixd/csv-utils.svg)](https://www.npmjs.com/package/@doeixd/csv-utils)\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\nA TypeScript library for CSV manipulation, featuring robust error handling, strong typing, and a fluent interface. This library provides comprehensive utilities for parsing, transforming, analyzing, and writing CSV data / arrays of objects, with support for operations like header mapping, streaming for large files, schema validation, and async processing.\n\n## Table of Contents\n\n- [Features](#features)\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [Examples](#examples)\n  - [Basic Operations](#basic-operations)\n  - [Custom Type Casting](#custom-type-casting)\n  - [Header Mapping](#header-mapping)\n    - [Basic Mapping](#basic-mapping)\n    - [Reading and Writing with Header Mapping](#reading-and-writing-with-header-mapping)\n    - [Array Mapping](#array-mapping)\n      - [Mapping Multiple Columns to an Array](#mapping-multiple-columns-to-an-array)\n      - [Explicit Column List for Array Mapping](#explicit-column-list-for-array-mapping)\n      - [Mapping an Array to Multiple Columns](#mapping-an-array-to-multiple-columns)\n  - [Preamble Handling](#preamble-handling)\n  - [Schema Validation](#schema-validation)\n    - [Using Standard Schema](#using-standard-schema)\n    - [Using Zod for Schema Validation](#using-zod-for-schema-validation)\n    - [Working with Validation Results](#working-with-validation-results)\n  - [Array Transformations](#array-transformations)\n  - [Async Processing](#async-processing)\n    - [Async File Operations](#async-file-operations)\n    - [Async Iteration and Batching](#async-iteration-and-batching)\n    - [Async Generators for Large Files](#async-generators-for-large-files)\n  - [Error Handling and Retries](#error-handling-and-retries)\n  - [Data Analysis and Transformation](#data-analysis-and-transformation)\n    - [Merging Datasets](#merging-datasets)\n    - [Simple Data Analysis](#simple-data-analysis)\n    - [Advanced Transformations (Join, Unpivot, etc.)](#advanced-transformations-join-unpivot-etc)\n- [Standalone Functions](#standalone-functions-module)\n  - [Quick Start with Standalone Functions](#quick-start-with-standalone-functions)\n  - [Functional Composition](#functional-composition)\n- [API Documentation](#api-documentation)\n  - [Core Class: CSV](#core-class-csv)\n    - [Static Methods](#static-methods)\n    - [Instance Methods](#instance-methods)\n  - [Utility Objects](#utility-objects)\n    - [CSVUtils](#csvutils)\n    - [CSVArrayUtils](#csvarrayutils)\n  - [Generator Functions](#generator-functions)\n  - [Key Types and Interfaces](#key-types-and-interfaces)\n    - [CSVError](#csverror)\n    - [Options Interfaces](#options-interfaces)\n    - [Casting Related Types](#casting-related-types)\n    - [Schema Related Types](#schema-related-types)\n    - [Other Types](#other-types)\n- [Memory-Efficient Stream Processing with `CSVStreamProcessor`](#memory-efficient-stream-processing-with-csvstreamprocessor)\n  - [Creating a Stream Processor](#creating-a-stream-processor)\n  - [Fluent Stream Transformations](#fluent-stream-transformations)\n  - [Executing the Stream Pipeline](#executing-the-stream-pipeline)\n- [Troubleshooting](#troubleshooting)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Features\n\n- **🔒 Type Safety** - Comprehensive TypeScript support with generic types for robust data handling.\n- **🧩 Flexible Header Mapping** - Sophisticated transformation between flat CSV columns and nested object structures, including mapping to/from array properties.\n- **📊 Rich Data Operations** - Extensive methods for querying, filtering, updating, sorting, grouping, and aggregating data.\n- **📈 Advanced Transformations** - Powerful tools for data conversion, including `join`, `pivot`, `unpivot`, `addColumn`, `castColumnType`, and more.\n- **⚡ Async \u0026 Parallel Processing** - Efficiently handle large files with asynchronous operations, stream processing, and worker thread support for CPU-intensive tasks.\n- **🛡️ Robust Error Handling** - Custom `CSVError` class and configurable retry mechanisms for I/O operations.\n- **📝 Extensive Preamble Support** - Read, store, and write CSV preambles (additional header lines/comments).\n- **🚀 Fluent Interface (Builder Pattern)** - Chain methods for elegant and readable data manipulation pipelines.\n- **🧠 Smart Custom Type Casting** - Define custom logic to test and parse string values into specific types (numbers, dates, booleans, custom objects) on a global or per-column basis.\n- **🔄 High-Performance Streaming API** - `CSVStreamProcessor` for processing massive CSV files with minimal memory footprint, featuring a fluent API.\n- **🔍 Schema Validation** - Integrated support for data validation against `StandardSchemaV1` (compatible with Zod and other validation libraries), with modes for erroring, filtering, or keeping invalid data.\n- **⚖️ Memory Efficiency** - Stream processing utilizes a fixed-size circular buffer with automatic backpressure to manage memory usage effectively for very large datasets.\n- **📦 Batch Processing** - Optimized methods for processing data in configurable batches for improved throughput in async operations.\n- **📦 Standalone Functions** - Alternative functional programming style for all core operations.\n\n## Installation\n\n```bash\nnpm install @doeixd/csv-utils\n# or\nyarn add @doeixd/csv-utils\n# or\npnpm add @doeixd/csv-utils\n```\n\n## Quick Start\nFor more check out the [dedicated quick start guide](docs/quick-start.md)\n```typescript\nimport CSV, { CSVUtils } from '@doeixd/csv-utils';\n\ninterface Product {\n  id: string;\n  name: string;\n  price: number;\n  category: string;\n  inventory?: number;\n  currency?: string;\n}\n\n// Read from a CSV file (assuming price is numeric in CSV or cast later)\nconst products = CSV.fromFile\u003cProduct\u003e('products.csv');\n\n// Chain operations\nconst result = products\n  .findRowsWhere(p =\u003e p.price \u003e 100)     // Find expensive products\n  .update({ currency: 'USD' })           // Add currency field\n  .updateColumn('price', p =\u003e p * 0.9)   // Apply 10% discount\n  .sortBy('price', 'desc')               // Sort by price (high to low)\n  .removeWhere(p =\u003e (p.inventory ?? 0) \u003c 5) // Remove low inventory items\n  .toArray();                            // Get the results as an array\n\n// Write back to file\nCSVUtils.writeCSV('discounted_products.csv', result);\n\n// Alternatively, write using the CSV instance\n// CSV.fromData(result).writeToFile('discounted_products.csv');\n```\n\n## Examples\n\n### Basic Operations\n\n```typescript\nimport CSV from '@doeixd/csv-utils';\n\ninterface User { id: string; name: string; role: string; department?: string; accessLevel?: string; }\n\n// Create from data\nconst users = CSV.fromData\u003cUser\u003e([\n  { id: '1', name: 'Alice', role: 'admin' },\n  { id: '2', name: 'Bob', role: 'user' },\n  { id: '3', name: 'Charlie', role: 'user' }\n]);\n\n// Query operations\nconst admin = users.findRow('1', 'id');\nconst allUsers = users.findRowsWhere(user =\u003e user.role === 'user');\n\n// Transformation\nconst withDepartment = users.update({ department: 'IT' });\nconst updatedUsers = users.updateWhere(\n  user =\u003e user.role === 'admin',\n  { accessLevel: 'full' }\n);\n\n// Output as CSV string (by default, includes headers)\nconst csvString = users.toString();\n// console.log(csvString);\n// id,name,role\n// 1,Alice,admin\n// 2,Bob,user\n// 3,Charlie,user\n\n// Write to file\nusers.writeToFile('users.csv');\n```\n\n### Custom Type Casting\n\nApply sophisticated type conversions beyond basic CSV parsing.\n\n```typescript\nimport CSV, { Caster, CSVReadOptions } from '@doeixd/csv-utils';\n\ninterface Order {\n  order_id: string;\n  discount_code: string | null; // Can be 'N/A' or empty\n  tax_rate: number;          // e.g., '7.5%' -\u003e 0.075\n  created_at: Date;          // e.g., '12/25/2023' -\u003e Date object\n  price: number;             // e.g., '$19.99' or '19.99'\n}\n\n// Custom caster for percentages (e.g., '7.5%' -\u003e 0.075)\nconst percentageCaster: Caster\u003cnumber\u003e = {\n  test: (value) =\u003e typeof value === 'string' \u0026\u0026 value.endsWith('%'),\n  parse: (value) =\u003e parseFloat(value.replace('%', '')) / 100,\n};\n\n// Custom caster for dates (e.g., 'MM/DD/YYYY')\nconst dateCaster: Caster\u003cDate\u003e = {\n  test: (value) =\u003e typeof value === 'string' \u0026\u0026 /^\\d{1,2}\\/\\d{1,2}\\/\\d{4}$/.test(value),\n  parse: (value) =\u003e {\n    const [month, day, year] = value.split('/').map(Number);\n    return new Date(year, month - 1, day); // Month is 0-indexed\n  },\n};\n\n// Custom caster for potentially null string values\nconst nullableStringCaster: Caster\u003cstring | null\u003e = {\n    test: (value) =\u003e typeof value === 'string' \u0026\u0026 (value.toUpperCase() === 'N/A' || value.trim() === ''),\n    parse: () =\u003e null,\n};\n\nconst readOptions: CSVReadOptions\u003cOrder\u003e = {\n  customCasts: {\n    definitions: { // Globally available casters by key\n      number: {\n        test: (value) =\u003e typeof value === 'string' \u0026\u0026 !isNaN(parseFloat(value.replace(/[^0-9.-]+/g, \"\"))),\n        parse: (value) =\u003e parseFloat(value.replace(/[^0-9.-]+/g, \"\")),\n      },\n      date: dateCaster, // Use our custom dateCaster\n      nullableString: nullableStringCaster,\n    },\n    columnCasts: { // Column-specific rules\n      order_id: 'string', // Use built-in string caster (or keep as is if already string)\n      discount_code: ['nullableString'], // Try nullableString caster first\n      tax_rate: [percentageCaster, 'number'], // Try percentage, then general number\n      created_at: 'date',\n      price: [ // Try multiple specific casters for price\n        { // Caster for '$XX.YY' format\n          test: (v) =\u003e typeof v === 'string' \u0026\u0026 v.startsWith('$'),\n          parse: (v) =\u003e parseFloat(v.substring(1)),\n        },\n        'number', // Fallback to general number caster\n      ],\n    },\n    onCastError: 'error', // 'error' (default), 'null', or 'original'\n  },\n};\n\n// Assuming 'orders.csv' contains:\n// order_id,discount_code,tax_rate,created_at,price\n// ORD001,NA,7.5%,12/25/2023,$19.99\n// ORD002,,5%,01/15/2024,25\nconst orders = CSV.fromFile\u003cOrder\u003e('orders.csv', readOptions);\n\nconst firstOrder = orders.toArray()[0];\nconsole.log(firstOrder.tax_rate); // 0.075\nconsole.log(firstOrder.created_at instanceof Date); // true\nconsole.log(firstOrder.price); // 19.99\nconsole.log(firstOrder.discount_code); // null\n```\n\n### Header Mapping\n\nTransform CSV column names to/from nested object properties.\n\n#### Basic Mapping\n\n```typescript\nimport { createHeaderMapFns, HeaderMap } from '@doeixd/csv-utils';\n\ninterface User {\n  id: string;\n  profile: { firstName: string; lastName: string; };\n  contact: { email: string; };\n}\n\n// Define a mapping: CSV header -\u003e object path\nconst headerMap: HeaderMap\u003cUser\u003e = {\n  'user_id': 'id',\n  'first_name': 'profile.firstName',\n  'last_name': 'profile.lastName',\n  'email_address': 'contact.email',\n};\n\n// Create mapping functions\nconst { fromRowArr, toRowArr } = createHeaderMapFns\u003cUser\u003e(headerMap);\n\n// Convert CSV row (object) to structured object\nconst csvRow = {\n  user_id: '123',\n  first_name: 'John',\n  last_name: 'Doe',\n  email_address: 'john@example.com',\n};\nconst userObject = fromRowArr(csvRow);\nconsole.log(userObject.profile.firstName); // John\n\n// Convert structured object back to a flat array for CSV writing\nconst csvHeaders = ['user_id', 'first_name', 'last_name', 'email_address'];\nconst flatArray = toRowArr(userObject, csvHeaders);\nconsole.log(flatArray); // ['123', 'John', 'Doe', 'john@example.com']\n```\n\n#### Reading and Writing with Header Mapping\n\n```typescript\nimport CSV, { HeaderMap } from '@doeixd/csv-utils';\n\ninterface User {\n  id: string;\n  profile: { firstName: string; lastName: string; };\n}\n\n// --- READING (flat CSV columns -\u003e nested object properties) ---\nconst inputHeaderMap: HeaderMap\u003cUser\u003e = {\n  'USER_IDENTIFIER': 'id',\n  'GIVEN_NAME': 'profile.firstName',\n  'FAMILY_NAME': 'profile.lastName',\n};\n// Assumes users_input.csv has columns: USER_IDENTIFIER,GIVEN_NAME,FAMILY_NAME\nconst users = CSV.fromFile\u003cUser\u003e('users_input.csv', { headerMap: inputHeaderMap });\nconsole.log(users.toArray()[0].profile.firstName);\n\n// --- WRITING (nested object properties -\u003e flat CSV columns) ---\nconst outputHeaderMap: HeaderMap\u003cUser\u003e = {\n  'id': 'UserID', // map 'id' property to 'UserID' CSV column\n  'profile.firstName': 'FirstName',\n  'profile.lastName': 'LastName',\n};\nusers.writeToFile('users_output.csv', {\n  headerMap: outputHeaderMap,\n  stringifyOptions: { header: true } // Ensure specified headers are written\n});\n// users_output.csv will have columns: UserID,FirstName,LastName\n```\n\n#### Array Mapping\n\nMap multiple CSV columns to/from an array property in your objects.\n\n##### Mapping Multiple Columns to an Array\n\n```typescript\nimport CSV, { HeaderMap, CsvToArrayConfig } from '@doeixd/csv-utils';\n\ninterface Product {\n  id: string;\n  name: string;\n  imageUrls: string[];\n}\n\n// CSV columns 'image_1', 'image_2', ... map to 'imageUrls' array\nconst productHeaderMap: HeaderMap\u003cProduct\u003e = {\n  'product_sku': 'id',\n  'product_name': 'name',\n  // This special key (e.g., '_imageMapping') is a config, not a CSV column.\n  '_imageMappingConfig': {\n    _type: 'csvToTargetArray',\n    targetPath: 'imageUrls', // Property in Product interface\n    sourceCsvColumnPattern: /^image_url_(\\d+)$/, // Matches 'image_url_1', 'image_url_2', etc.\n    // Optional: sort columns before adding to array (e.g., by the number in pattern)\n    sortSourceColumnsBy: (match) =\u003e parseInt(match[1], 10),\n    // Optional: transform each value before adding to array\n    transformValue: (value) =\u003e (value ? `https://cdn.example.com/${value}` : null),\n    // Optional: filter out null/empty values after transformation\n    filterEmptyValues: true,\n  } as CsvToArrayConfig,\n};\n\n// Assuming products_images.csv:\n// product_sku,product_name,image_url_2,image_url_1\n// SKU001,Awesome Gadget,gadget_thumb.jpg,gadget_main.jpg\nconst products = CSV.fromFile\u003cProduct\u003e('products_images.csv', { headerMap: productHeaderMap });\n// products.toArray()[0].imageUrls will be ['https://cdn.example.com/gadget_main.jpg', 'https://cdn.example.com/gadget_thumb.jpg']\n```\n\n##### Explicit Column List for Array Mapping\n\n```typescript\n// If CSV columns don't follow a pattern, list them explicitly:\nconst explicitImageMap: HeaderMap\u003cProduct\u003e = {\n  'product_sku': 'id',\n  'product_name': 'name',\n  '_imageMappingConfig': {\n    _type: 'csvToTargetArray',\n    targetPath: 'imageUrls',\n    sourceCsvColumns: ['mainProductImage', 'thumbnailImage', 'galleryImage3'],\n  } as CsvToArrayConfig,\n};\n```\n\n##### Mapping an Array to Multiple Columns\n\n```typescript\nimport CSV, { HeaderMap, ObjectArrayToCsvConfig } from '@doeixd/csv-utils';\n// (Product interface is same as above)\n\nconst productsData: Product[] = [\n  { id: 'SKU002', name: 'Another Item', imageUrls: ['item_front.png', 'item_back.png'] }\n];\n\n// Map 'imageUrls' array back to CSV columns 'image_col_0', 'image_col_1', ...\nconst writeProductHeaderMap: HeaderMap\u003cProduct\u003e = {\n  'id': 'product_sku',\n  'name': 'product_name',\n  'imageUrls': { // Key must match the array property name in Product\n    _type: 'targetArrayToCsv',\n    targetCsvColumnPrefix: 'image_col_', // Output columns: image_col_0, image_col_1, ...\n    maxColumns: 3, // Create up to 3 image columns\n    emptyCellOutput: '', // Value for empty cells if array is shorter than maxColumns\n    // Optional: transform value before writing\n    transformValue: (value) =\u003e value.replace('https://cdn.example.com/', ''),\n  } as ObjectArrayToCsvConfig,\n};\n\nCSV.fromData(productsData).writeToFile('products_output_arrays.csv', {\n  headerMap: writeProductHeaderMap,\n  stringifyOptions: { header: true }\n});\n// products_output_arrays.csv might have:\n// product_sku,product_name,image_col_0,image_col_1,image_col_2\n// SKU002,Another Item,item_front.png,item_back.png,\"\"\n```\n\n### Preamble Handling\n\nManage metadata or comments at the beginning of CSV files.\n\n```typescript\nimport CSV from '@doeixd/csv-utils';\n\n// Example CSV file (data_with_preamble.csv):\n// # File Generated: 2024-01-01\n// # Source: SystemX\n// id,name,value\n// 1,Alpha,100\n// 2,Beta,200\n\n// --- Reading with Preamble ---\nconst csvInstance = CSV.fromFile('data_with_preamble.csv', {\n  saveAdditionalHeader: true, // Enable preamble capture\n  csvOptions: {\n    from_line: 3, // Actual data starts on line 3\n    comment: '#',   // Treat lines starting with # as comments (part of preamble if before from_line)\n  },\n  // Optional: dedicated parsing options for the preamble itself\n  additionalHeaderParseOptions: {\n    delimiter: ',', // If preamble has a different structure\n    // Note: options like 'columns', 'from_line', 'to_line' are overridden for preamble.\n  }\n});\n\nconsole.log('Preamble:\\n', csvInstance.additionalHeader);\n// Preamble:\n// # File Generated: 2024-01-01\n// # Source: SystemX\n\nconsole.log('Data:', csvInstance.toArray());\n// Data: [ { id: '1', name: 'Alpha', value: '100' }, { id: '2', name: 'Beta', value: '200' } ]\n\n// --- Writing with Preamble ---\nconst preambleContent = `# Exported: ${new Date().toISOString()}\\n# User: admin\\n`;\ncsvInstance.writeToFile('output_with_preamble.csv', {\n  additionalHeader: preambleContent,\n});\n\n// To preserve an existing preamble when modifying and saving:\nconst modifiedCsv = csvInstance.updateColumn('value', v =\u003e parseInt(v) * 2);\nmodifiedCsv.writeToFile('modified_output.csv', {\n  additionalHeader: csvInstance.additionalHeader // Use the original preamble\n});\n```\n**Note on `saveAdditionalHeader`:**\n- If `number \u003e 0`: Specifies the exact number of lines to extract as the preamble. Data parsing will start after these lines, unless `csvOptions.from_line` points to an even later line.\n- If `true`: Enables preamble extraction *if* `csvOptions.from_line` is set to a value greater than 1. The preamble will consist of `csvOptions.from_line - 1` lines.\n- If `false`, `0`, or `undefined`: No preamble is extracted.\n\n### Schema Validation\n\nValidate CSV data against predefined schemas.\n\n#### Using Standard Schema\n\nThis library supports `StandardSchemaV1` for defining custom validation logic.\n\n```typescript\nimport CSV, { StandardSchemaV1, CSVSchemaConfig } from '@doeixd/csv-utils';\n\ninterface User { id: number; email: string; age?: number; }\n\n// Custom schema for validating email strings\nconst emailFormatSchema: StandardSchemaV1\u003cstring, string\u003e = {\n  '~standard': {\n    version: 1,\n    vendor: 'csv-utils-example',\n    validate: (value: unknown): StandardSchemaV1.Result\u003cstring\u003e =\u003e {\n      if (typeof value !== 'string') return { issues: [{ message: 'Must be a string' }] };\n      if (!/^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$/.test(value)) return { issues: [{ message: 'Invalid email format' }] };\n      return { value };\n    },\n    types: { input: '' as string, output: '' as string }\n  }\n};\n\nconst userSchemaConfig: CSVSchemaConfig\u003cUser\u003e = {\n  columnSchemas: {\n    id: { // Ensure ID is a positive number (example with simple validation)\n      '~standard': {\n        version: 1, vendor: 'csv-utils-example',\n        validate: (v: unknown) =\u003e {\n          const n = Number(v);\n          if (isNaN(n) || n \u003c= 0) return { issues: [{message: \"ID must be a positive number\"}]};\n          return { value: n };\n        },\n        types: { input: undefined as any, output: 0 as number }\n      }\n    },\n    email: emailFormatSchema,\n  },\n  validationMode: 'filter', // 'error', 'filter', or 'keep'\n  // useAsync: false // Default, set to true for async validation logic within schemas\n};\n\n// Assuming users_for_validation.csv:\n// id,email,age\n// 1,alice@example.com,30\n// two,bob-invalid-email,25\n// 3,carol@example.com,\nconst users = CSV.fromFile\u003cUser\u003e('users_for_validation.csv', { schema: userSchemaConfig });\n// 'users' will only contain valid rows due to 'filter' mode.\n// { id: 1, email: 'alice@example.com', age: '30' } // age is still string from parser\n// { id: 3, email: 'carol@example.com', age: ''   }\n```\n\n#### Using Zod for Schema Validation\n\nRequires `zod` to be installed (`npm install zod`).\n\n```typescript\nimport CSV, { CSVSchemaConfig } from '@doeixd/csv-utils';\nimport { z } from 'zod';\n\nconst zodUserSchema = z.object({\n  id: z.string().min(1, \"ID is required\"),\n  name: z.string().min(2, \"Name must be at least 2 characters\"),\n  email: z.string().email(\"Invalid email address\"),\n  age: z.number().positive(\"Age must be a positive number\").optional(),\n});\ntype ZodUser = z.infer\u003ctypeof zodUserSchema\u003e;\n\nconst csvWithZodSchema: CSVSchemaConfig\u003cZodUser\u003e = {\n  rowSchema: zodUserSchema, // Apply to the whole row after initial parsing \u0026 custom casting\n  // columnSchemas: { // Can also define Zod schemas for individual columns for pre-rowSchema validation\n  //   age: z.coerce.number().positive().optional() // Coerce age to number before row validation\n  // },\n  validationMode: 'filter',\n  // useAsync: true // If any Zod schema uses async refinements\n};\n\n// Example: use customCasts to convert 'age' before Zod validation\nconst usersZod = CSV.fromFile\u003cZodUser\u003e('users_data.csv', {\n  customCasts: { // Convert age string to number before Zod validation\n    columnCasts: { age: 'number' },\n    definitions: { number: { test: v =\u003e !isNaN(parseFloat(v)), parse: v =\u003e parseFloat(v) } }\n  },\n  schema: csvWithZodSchema\n});\n```\n\n#### Working with Validation Results\n\nIf `validationMode: 'keep'` is used, results are available on the `CSV` instance.\n\n```typescript\nconst configKeep: CSVSchemaConfig\u003cUser\u003e = { /* ... */ validationMode: 'keep' };\nconst usersResult = CSV.fromFile\u003cUser\u003e('users.csv', { schema: configKeep });\n\nif (usersResult.validationResults) {\n  usersResult.validationResults.forEach(res =\u003e {\n    if (!res.valid) {\n      console.log(`Invalid row: ${JSON.stringify(res.originalRow)}`);\n      if (res.rowIssues) console.log('  Row issues:', res.rowIssues.map(i =\u003e i.message));\n      if (res.columnIssues) {\n        Object.entries(res.columnIssues).forEach(([col, issues]) =\u003e {\n          console.log(`  Column '${col}' issues:`, issues.map(i =\u003e i.message));\n        });\n      }\n    }\n  });\n}\n```\n\n### Array Transformations\n\nUtilities for converting between arrays of arrays and arrays of objects.\n\n```typescript\nimport { CSVArrayUtils, HeaderMap } from '@doeixd/csv-utils';\n\ninterface ProductRecord { id: string; productName: string; unitPrice: number; category: string; }\n\n// --- Array of Arrays -\u003e Array of Objects ---\nconst csvDataAsArrays = [\n  ['SKU', 'Item Name', 'Price', 'Type'], // Header\n  ['A123', 'Super Widget', '19.99', 'Gadgets'],\n  ['B456', 'Mega Thinger', '29.50', 'Gizmos'],\n];\nconst productMap: HeaderMap\u003cProductRecord\u003e = {\n  0: 'id', // Index 0 maps to 'id'\n  1: 'productName',\n  2: 'unitPrice', // This will be a string initially from CSV\n  3: 'category',\n};\nconst productsArray = CSVArrayUtils.arrayToObjArray\u003cProductRecord\u003e(\n  csvDataAsArrays.slice(1), // Data rows\n  productMap\n);\n// productsArray[0] = { id: 'A123', productName: 'Super Widget', unitPrice: '19.99', category: 'Gadgets' }\n// Note: For type conversion (e.g., string '19.99' to number), use CSV class with customCasts or schema validation.\n\n// --- Array of Objects -\u003e Array of Arrays ---\nconst productObjects: ProductRecord[] = [\n  { id: 'C789', productName: 'Hyper Spanner', unitPrice: 9.95, category: 'Tools' },\n];\n// Map object properties back to array indices/CSV headers\nconst outputMapConfig: HeaderMap = { // Here, keys are object paths, values are CSV headers or indices\n  'id': 'Product ID',\n  'productName': 'Name',\n  'unitPrice': 'Cost',\n  'category': 'Department',\n};\nconst outputHeaders = ['Product ID', 'Name', 'Cost', 'Department'];\nconst arraysForCsv = CSVArrayUtils.objArrayToArray\u003cProductRecord\u003e(\n  productObjects,\n  outputMapConfig,\n  outputHeaders,\n  true // Include headers as the first row\n);\n// arraysForCsv = [\n//   ['Product ID', 'Name', 'Cost', 'Department'],\n//   ['C789', 'Hyper Spanner', 9.95, 'Tools']\n// ]\n\n// --- Grouping ---\nconst groupedByCategory = CSVArrayUtils.groupByField(productsArray, 'category');\n// groupedByCategory['Gadgets'] would be an array of products in that category.\n```\n\n### Async Processing\n\nHandle large datasets and I/O-bound operations efficiently.\n\n#### Async File Operations\n```typescript\nimport CSV from '@doeixd/csv-utils';\ninterface MyData { /* ... */ }\n\n// Asynchronously read from a file (loads all data into memory after parsing)\nasync function loadDataAsync() {\n  const csvData = await CSV.fromFileAsync\u003cMyData\u003e('large_dataset.csv', {\n    // CSVReadOptions apply here, e.g., headerMap, customCasts, schema\n  });\n  console.log(`Loaded ${csvData.count()} records.`);\n  return csvData;\n}\n\n// Asynchronously write to a file\nasync function saveDataAsync(csvInstance: CSV\u003cMyData\u003e) {\n  await csvInstance.writeToFileAsync('output_dataset.csv');\n  console.log('Data written asynchronously.');\n}\n```\n\n#### Async Iteration and Batching\n```typescript\nasync function processDataInBatches(csvInstance: CSV\u003cMyData\u003e) {\n  // Process each row with an async callback\n  await csvInstance.forEachAsync(async (row, index) =\u003e {\n    // await someAsyncDbUpdate(row);\n    console.log(`Processed row ${index + 1} asynchronously.`);\n  }, { batchSize: 100, batchConcurrency: 5 }); // 100 items per batch, 5 batches concurrently\n\n  // Transform data with an async mapping function\n  const enrichedData = await csvInstance.mapAsync(async (row) =\u003e {\n    // const details = await fetchExtraDetails(row.id);\n    // return { ...row, ...details };\n    return row; // Placeholder\n  }, { batchSize: 50, batchConcurrency: 10 });\n  \n  console.log(`Enriched ${enrichedData.length} records.`);\n}\n```\n\n#### Async Generators for Large Files\nIdeal for memory-efficient processing of very large files.\n\n```typescript\nimport { csvGenerator, csvBatchGenerator, writeCSVFromGenerator, CSVStreamOptions } from '@doeixd/csv-utils';\ninterface LogEntry { timestamp: string; level: string; message: string; }\n\nconst streamOptions: CSVStreamOptions\u003cLogEntry\u003e = {\n  csvOptions: { columns: true, trim: true },\n  // headerMap: { /* ... */ }, // Optional header mapping\n  // transform: (row) =\u003e ({ ...row, parsedAt: new Date() }) // Optional row transformation\n};\n\nasync function analyzeLogs() {\n  // Process row by row\n  let errorCount = 0;\n  for await (const log of csvGenerator\u003cLogEntry\u003e('application.log', streamOptions)) {\n    if (log.level === 'ERROR') errorCount++;\n  }\n  console.log(`Total error logs: ${errorCount}`);\n\n  // Process in batches\n  for await (const batch of csvBatchGenerator\u003cLogEntry\u003e('application.log', { ...streamOptions, batchSize: 1000 })) {\n    // await bulkInsertToDb(batch);\n    console.log(`Processed batch of ${batch.length} logs.`);\n  }\n}\n\n// Example: Transform and write using generators\nasync function transformAndWriteLogs() {\n  async function* transformedLogGenerator() {\n    for await (const log of csvGenerator\u003cLogEntry\u003e('input.log')) {\n      if (log.level === 'INFO') { // Filter and transform\n        yield { ...log, message: log.message.toUpperCase() } as LogEntry;\n      }\n    }\n  }\n  await writeCSVFromGenerator('output_info_logs.csv', transformedLogGenerator());\n}\n```\n\n### Error Handling and Retries\n\n```typescript\nimport CSV, { CSVError } from '@doeixd/csv-utils';\n\ntry {\n  const data = CSV.fromFile('potentially_flaky_network_file.csv', {\n    retry: {\n      maxRetries: 3,        // Attempt up to 3 times after initial failure\n      baseDelay: 500,       // Initial delay 500ms, then 1000ms, 2000ms (exponential backoff)\n      logRetries: true,     // Log retry attempts to console.warn\n    }\n  });\n  // ... process data\n} catch (error) {\n  if (error instanceof CSVError) {\n    console.error(`CSV operation failed: ${error.message}`);\n    if (error.cause) {\n      console.error('Underlying cause:', error.cause);\n    }\n  } else {\n    console.error('An unexpected error occurred:', error);\n  }\n}\n```\n\n### Data Analysis and Transformation\n\n#### Merging Datasets\n```typescript\nimport CSV from '@doeixd/csv-utils';\ninterface InventoryItem { sku: string; name: string; price: number; stock: number; }\ninterface SalesDataItem { sku: string; unitsSold: number; }\n\nconst inventory = CSV.fromData\u003cInventoryItem\u003e([\n  { sku: 'A1', name: 'Apple', price: 1.0, stock: 100 },\n  { sku: 'B2', name: 'Banana', price: 0.5, stock: 150 },\n]);\nconst sales = CSV.fromData\u003cSalesDataItem\u003e([\n  { sku: 'A1', unitsSold: 10 },\n  { sku: 'C3', unitsSold: 5 }, // This SKU not in inventory\n]);\n\n// Merge sales data into inventory, updating stock\nconst updatedInventory = inventory.mergeWith(\n  sales,\n  (invItem, saleItem) =\u003e invItem.sku === saleItem.sku, // Equality condition\n  (invItem, saleItem) =\u003e ({ // Merge function for matched items\n    ...invItem,\n    stock: invItem.stock - saleItem.unitsSold,\n  })\n);\n// updatedInventory will have Banana unchanged, Apple with reduced stock.\n// Items only in 'sales' are not included by default with this merge logic.\n```\n\n#### Simple Data Analysis\n```typescript\nimport CSV from '@doeixd/csv-utils';\ninterface Sale { product: string; region: string; amount: number; month: string; }\n\nconst salesData = CSV.fromData\u003cSale\u003e([\n  { product: 'Laptop', region: 'North', amount: 1200, month: 'Jan' },\n  { product: 'Mouse', region: 'North', amount: 25, month: 'Jan' },\n  { product: 'Laptop', region: 'South', amount: 1500, month: 'Feb' },\n  { product: 'Keyboard', region: 'North', amount: 75, month: 'Jan' },\n]);\n\nconst totalRevenue = salesData.aggregate('amount', 'sum'); // Sum of 'amount'\nconst averageSale = salesData.aggregate('amount', 'avg');\nconst uniqueRegions = salesData.distinct('region'); // ['North', 'South']\n\n// Pivot table: product sales by region\nconst salesPivot = salesData.pivot('product', 'region', 'amount');\n// salesPivot = {\n//   Laptop: { North: 1200, South: 1500 },\n//   Mouse: { North: 25 },\n//   Keyboard: { North: 75 }\n// }\n```\n\n#### Advanced Transformations (Join, Unpivot, etc.)\n```typescript\nimport CSV from '@doeixd/csv-utils';\n\n// --- Join Example ---\ninterface User { id: number; name: string; cityId: number; }\ninterface City { cityId: number; cityName: string; }\nconst users = CSV.fromData\u003cUser\u003e([ { id: 1, name: 'Alice', cityId: 101 }, { id: 2, name: 'Bob', cityId: 102 } ]);\nconst cities = CSV.fromData\u003cCity\u003e([ { cityId: 101, cityName: 'New York' }, { cityId: 103, cityName: 'Paris' } ]);\n\nconst usersWithCities = users.join(\n  cities,\n  { left: 'cityId', right: 'cityId', type: 'left' }, // Left join on cityId\n  (user, city) =\u003e ({ // Custom select function for the result\n    userId: user!.id,\n    userName: user!.name,\n    cityName: city ? city.cityName : 'Unknown',\n  })\n);\n// usersWithCities.toArray() would include Alice with New York, Bob with Unknown city.\n\n// --- Unpivot Example ---\ninterface QuarterlySales { product: string; q1: number; q2: number; }\nconst wideSales = CSV.fromData\u003cQuarterlySales\u003e([ { product: 'Gadget', q1: 100, q2: 150 } ]);\nconst longSales = wideSales.unpivot(\n  ['product'], // ID columns to repeat\n  ['q1', 'q2'],  // Value columns to unpivot\n  'quarter',     // Name for the new 'variable' column\n  'sales'        // Name for the new 'value' column\n);\n// longSales.toArray() = [\n//   { product: 'Gadget', quarter: 'q1', sales: 100 },\n//   { product: 'Gadget', quarter: 'q2', sales: 150 }\n// ]\n\n// Other useful transformations:\nconst sampleData = CSV.fromData([{ a:1, b:\" x \"}, {a:2, b:\" y \"}]);\nconst cleanedData = sampleData\n  .addColumn('c', row =\u003e row.a * 2)         // Add new column 'c'\n  .renameColumn('a', 'alpha')             // Rename 'a' to 'alpha'\n  .castColumnType('alpha', 'string')      // Cast 'alpha' to string\n  .normalizeText('b', 'uppercase')        // Uppercase column 'b'\n  .trimWhitespace(['b'])                  // Trim whitespace from 'b'\n  .fillMissingValues('alpha', 'N/A');     // Fill missing in 'alpha' (if any)\n```\n\n## Standalone Functions Module\n\nFor a more functional programming style, standalone functions are available. They operate on arrays of objects and return new arrays or values, mirroring the `CSV` class methods.\n\n### Quick Start with Standalone Functions\n```typescript\nimport { findRowsWhere, updateColumn, sortBy, aggregate } from '@doeixd/csv-utils/standalone';\n// Or import all as a namespace: import csvFn from '@doeixd/csv-utils/standalone';\n\ninterface Product { id: string; name: string; price: number; category: string; }\nconst products: Product[] = [\n  { id: 'P001', name: 'Laptop', price: 899.99, category: 'Electronics' },\n  { id: 'P002', name: 'Headphones', price: 149.99, category: 'Electronics' },\n  { id: 'P003', name: 'T-shirt', price: 19.99, category: 'Clothing' },\n];\n\n// Find expensive electronics\nconst expensiveElectronics = findRowsWhere(\n  products,\n  p =\u003e p.category === 'Electronics' \u0026\u0026 p.price \u003e 500\n);\n\n// Apply discount to all products\nconst discounted = updateColumn(products, 'price', (price: number) =\u003e price * 0.9);\n\n// Sort products by price (descending)\nconst sortedByPrice = sortBy(products, 'price', 'desc');\n\n// Get max price\nconst maxPrice = aggregate(products, 'price', 'max'); // csvFn.aggregate(...)\n```\n\n### Functional Composition\n\nStandalone functions are well-suited for composition libraries like `fp-ts`.\n```typescript\nimport { pipe } from 'fp-ts/function'; // Example with fp-ts\nimport { findRowsWhere, updateColumn, sortBy } from '@doeixd/csv-utils/standalone';\n// (products array defined as above)\n\nconst processProducts = (data: Product[]) =\u003e pipe(\n  data,\n  d =\u003e findRowsWhere(d, p =\u003e p.category === 'Electronics'),\n  d =\u003e updateColumn(d, 'price', (price: number) =\u003e price * 0.9),\n  d =\u003e sortBy(d, 'price', 'asc')\n);\n\nconst processed = processProducts(products);\n```\n\n## API Documentation\n\n### Core Class: CSV\n\nThe central class for CSV manipulation with a fluent interface.\n\n#### Static Methods\n\n| Method                                       | Description                                                                    | Return Type                 |\n| :------------------------------------------- | :----------------------------------------------------------------------------- | :-------------------------- |\n| `fromFile\u003cT\u003e(filename, options?)`            | Creates a CSV instance from a file path.                                       | `CSV\u003cT\u003e`                    |\n| `fromData\u003cT\u003e(data)`                          | Creates a CSV instance from an array of objects.                               | `CSV\u003cT\u003e`                    |\n| `fromString\u003cT\u003e(csvString, options?)`         | Creates a CSV instance from a CSV content string.                              | `CSV\u003cT\u003e`                    |\n| `fromStream\u003cT\u003e(stream, options?)`            | Creates a CSV instance from a NodeJS Readable stream.                          | `Promise\u003cCSV\u003cT\u003e\u003e`           |\n| `fromFileAsync\u003cT\u003e(filename, options?)`       | Asynchronously creates a CSV instance from a file path using streams.          | `Promise\u003cCSV\u003cT\u003e\u003e`           |\n| `streamFromFile\u003cSourceRowType\u003e(filename, options?)` | Creates a `CSVStreamProcessor` for fluent, memory-efficient stream operations. | `CSVStreamProcessor\u003cSourceRowType, SourceRowType\u003e` |\n\n_`options` for read methods are typically `CSVReadOptions\u003cT\u003e`._\n\n#### Instance Methods\n\n##### Data Retrieval \u0026 Output\n| Method                                       | Description                                                                 | Return Type                 |\n| :------------------------------------------- | :-------------------------------------------------------------------------- | :-------------------------- |\n| `toArray()`                                  | Returns the internal data as a new array of objects.                        | `T[]`                       |\n| `toString(options?: CsvStringifyOptions\u003cT\u003e)` | Converts the data to a CSV string. Supports `headerMap` via options.        | `string`                    |\n| `count()`                                    | Returns the number of rows.                                                 | `number`                    |\n| `getBaseRow(defaults?)`                      | Creates a template object based on the CSV's column structure.              | `Partial\u003cT\u003e`                |\n| `createRow(data?)`                           | Creates a new row object conforming to the CSV's structure.                 | `T`                         |\n| `writeToFile(filename, options?)`            | Writes the CSV data to a file.                                              | `void`                      |\n| `writeToFileAsync(filename, options?)`       | Asynchronously writes the CSV data to a file.                               | `Promise\u003cvoid\u003e`             |\n\n##### Validation\n| Method                                       | Description                                                                 | Return Type                 |\n| :------------------------------------------- | :-------------------------------------------------------------------------- | :-------------------------- |\n| `validate\u003cU = T\u003e(schema)`                    | Validates data synchronously against a schema. Throws on async schema.      | `CSV\u003cU\u003e`                    |\n| `validateAsync\u003cU = T\u003e(schema)`               | Validates data asynchronously against a schema.                             | `Promise\u003cCSV\u003cU\u003e\u003e`           |\n| `validationResults` (readonly property)      | Array of `RowValidationResult\u003cT\u003e` if schema validation used 'keep' mode.    | `RowValidationResult\u003cT\u003e[] \\| undefined` |\n\n\n##### Query Methods\n| Method                               | Description                                                         | Return Type                     |\n| :----------------------------------- | :------------------------------------------------------------------ | :------------------------------ |\n| `findRow(value, column?)`            | Finds the first row where `column` strictly matches `value`.        | `T \\| undefined`                |\n| `findRowByRegex(regex, column?)`     | Finds the first row where `column` matches `regex`.                 | `T \\| undefined`                |\n| `findRows(value, column?)`           | Finds all rows where `column` (as string) includes `value` (as string). | `T[]`                           |\n| `findRowWhere(predicate)`            | Finds the first row matching the `predicate` function.              | `T \\| undefined`                |\n| `findRowsWhere(predicate)`           | Finds all rows matching the `predicate` function.                   | `T[]`                           |\n| `findSimilarRows(str, column)`       | Finds rows with string similarity to `str` in `column`, sorted by distance. | `SimilarityMatch\u003cT\u003e[]`          |\n| `findMostSimilarRow(str, column)`    | Finds the most similar row to `str` in `column`.                    | `SimilarityMatch\u003cT\u003e \\| undefined` |\n\n##### Transformation Methods\n| Method                                                  | Description                                                              | Return Type                                     |\n| :------------------------------------------------------ | :----------------------------------------------------------------------- | :---------------------------------------------- |\n| `update(modifications)`                                 | Updates all rows. `modifications` can be an object or a function.        | `CSV\u003cT\u003e`                                        |\n| `updateWhere(condition, modifications)`                 | Updates rows matching `condition`.                                       | `CSV\u003cT\u003e`                                        |\n| `updateColumn(column, valueOrFn)`                       | Updates a specific `column` in all rows.                                 | `CSV\u003cT\u003e`                                        |\n| `transform\u003cR\u003e(transformer)`                             | Transforms each row into a new structure `R`.                            | `CSV\u003cR\u003e`                                        |\n| `removeWhere(condition)`                                | Removes rows matching `condition`.                                       | `CSV\u003cT\u003e`                                        |\n| `append(...rows)`                                       | Adds new `rows` to the dataset.                                          | `CSV\u003cT\u003e`                                        |\n| `mergeWith\u003cE\u003e(other, equalityFn, mergeFn)`              | Merges with another dataset `other` (array or `CSV\u003cE\u003e`).                 | `CSV\u003cT\u003e`                                        |\n| `addColumn\u003cNK, NV\u003e(colName, valOrFn)`                   | Adds a new column `colName` of type `NK` with values of type `NV`.       | `CSV\u003cT \u0026 Record\u003cNK, NV\u003e\u003e`                     |\n| `removeColumn\u003cK\u003e(colNames)`                             | Removes one or more `colNames`.                                          | `CSV\u003cOmit\u003cT, K\u003e\u003e`                             |\n| `renameColumn\u003cOK, NK\u003e(oldName, newName)`                | Renames `oldName` (type `OK`) to `newName` (type `NK`).                  | `CSV\u003cOmit\u003cT, OK\u003e \u0026 Record\u003cNK, T[OK]\u003e\u003e`          |\n| `reorderColumns(orderedNames)`                          | Reorders columns based on `orderedNames`.                                | `CSV\u003cT\u003e`                                        |\n| `castColumnType(colName, targetType)`                   | Casts `colName` to `targetType` ('string', 'number', 'boolean', 'date'). | `CSV\u003cT\u003e` (underlying data type changes)         |\n| `deduplicate(colsToCheck?)`                             | Removes duplicate rows, optionally checking specific `colsToCheck`.      | `CSV\u003cT\u003e`                                        |\n| `split(condition)`                                      | Splits data into two `CSV` instances (`pass`, `fail`) based on `condition`. | `{ pass: CSV\u003cT\u003e; fail: CSV\u003cT\u003e }`                |\n| `join\u003cO, J\u003e(otherCsv, onConfig, selectFn?)`             | Joins with `otherCsv` (`CSV\u003cO\u003e`) based on `onConfig`, produces `CSV\u003cJ\u003e`.   | `CSV\u003cJ\u003e`                                        |\n| `unpivot\u003cI, V, VN, VLN\u003e(idCols, valCols, varN?, valN?)`  | Transforms data from wide to long format.                                | `CSV\u003c новой_структуры \u003e`                       |\n| `fillMissingValues\u003cK\u003e(colName, valOrFn)`                | Fills `null`/`undefined` in `colName`.                                   | `CSV\u003cT\u003e`                                        |\n| `normalizeText\u003cK\u003e(colName, normType)`                   | Normalizes text case in `colName` (`lowercase`, `uppercase`, `capitalize`).| `CSV\u003cT\u003e`                                        |\n| `trimWhitespace(columns?)`                              | Trims whitespace from string values in specified (or all) `columns`.     | `CSV\u003cT\u003e`                                        |\n\n##### Analysis \u0026 Sampling Methods\n| Method                                       | Description                                                              | Return Type        |\n| :------------------------------------------- | :----------------------------------------------------------------------- | :----------------- |\n| `groupBy(column)`                            | Groups rows by values in `column`.                                       | `Record\u003cstring, T[]\u003e` |\n| `sortBy\u003cK\u003e(column, direction?)`              | Sorts rows by `column`.                                                  | `CSV\u003cT\u003e`           |\n| `sortByAsync\u003cK\u003e(column, direction?)`         | Asynchronously sorts rows, potentially using worker threads.             | `Promise\u003cCSV\u003cT\u003e\u003e`  |\n| `aggregate\u003cK\u003e(column, operation?)`           | Calculates 'sum', 'avg', 'min', 'max', 'count' for `column`.             | `number`           |\n| `distinct\u003cK\u003e(column)`                        | Gets unique values from `column`.                                        | `Array\u003cT[K]\u003e`      |\n| `pivot(rowCol, colCol, valCol)`              | Creates a pivot table.                                                   | `Record\u003cstring, Record\u003cstring, unknown\u003e\u003e` |\n| `sample(count?)`                             | Gets `count` random rows.                                                | `CSV\u003cT\u003e`           |\n| `head(count?)` / `take(count?)`              | Gets the first `count` rows.                                             | `CSV\u003cT\u003e`           |\n| `tail(count?)`                               | Gets the last `count` rows.                                              | `CSV\u003cT\u003e`           |\n\n##### Iteration Methods\n| Method                                       | Description                                                              | Return Type        |\n| :------------------------------------------- | :----------------------------------------------------------------------- | :----------------- |\n| `forEach(callback)`                          | Executes `callback` for each row.                                        | `void`             |\n| `forEachAsync(callback, options?)`           | Asynchronously executes `callback` for each row, with batching.          | `Promise\u003cvoid\u003e`    |\n| `map\u003cR\u003e(callback)`                           | Creates a new array by applying `callback` to each row.                  | `R[]`              |\n| `mapAsync\u003cR\u003e(callback, options?)`            | Asynchronously creates a new array, with batching.                       | `Promise\u003cR[]\u003e`     |\n| `reduce\u003cR\u003e(callback, initialValue)`          | Reduces rows to a single value.                                          | `R`                |\n| `reduceAsync\u003cR\u003e(callback, initialValue, options?)` | Asynchronously reduces rows, with optimized batching/parallel strategies. | `Promise\u003cR\u003e`       |\n\n### Utility Objects\n\n#### CSVUtils\nStandalone utility functions.\n\n| Function                                       | Description                                                              |\n| :--------------------------------------------- | :----------------------------------------------------------------------- |\n| `mergeRows(arrA, arrB, eqFn, mergeFn)`         | Merges two arrays of objects based on custom logic.                      |\n| `clone(obj)`                                   | Deep clones an object (using `JSON.parse(JSON.stringify(obj))`).         |\n| `isValidCSV(str)`                              | Performs a quick check if a string seems to be valid CSV.                |\n| `writeCSV(filename, data, options?)`           | Writes an array of objects `data` to a CSV `filename`.                   |\n| `writeCSVAsync(filename, data, options?)`      | Asynchronously writes `data` to `filename`.                              |\n| `createTransformer\u003cT, R\u003e(transformFn)`         | Creates a NodeJS `Transform` stream for row-by-row transformation.       |\n| `processInWorker\u003cT, R\u003e(operation, data)`       | Executes a serializable `operation` with `data` in a worker thread.      |\n| `processInParallel\u003cT, R\u003e(items, op, opts?)`    | Processes `items` in parallel using worker threads. Not for order-dependent ops like sort. |\n\n#### CSVArrayUtils\nUtilities for converting between arrays and objects, often used with header maps.\n\n| Function                                       | Description                                                              |\n| :--------------------------------------------- | :----------------------------------------------------------------------- |\n| `arrayToObjArray\u003cT\u003e(data, headerMap, headerRow?)` | Transforms an array of arrays/objects `data` to an array of `T` objects using `headerMap`. |\n| `objArrayToArray\u003cT\u003e(data, headerMap, headers?, includeHeaders?)` | Transforms an array of `T` objects `data` to an array of arrays using `headerMap`. |\n| `groupByField\u003cT\u003e(data, field)`                 | Groups an array of `T` objects `data` by the value of `field` (can be a dot-path). |\n\n### Generator Functions\nFor memory-efficient processing of large CSV files.\n\n| Function                                       | Description                                                              |\n| :--------------------------------------------- | :----------------------------------------------------------------------- |\n| `csvGenerator\u003cT\u003e(filename, options?)`          | Asynchronously yields rows of type `T` one by one from `filename`.       |\n| `csvBatchGenerator\u003cT\u003e(filename, options?)`     | Asynchronously yields batches (arrays of `T`) from `filename`.           |\n| `writeCSVFromGenerator\u003cT\u003e(filename, generator, options?)` | Writes data from an async `generator` of `T` rows to `filename`.       |\n\n_`options` for generator functions are `CSVStreamOptions\u003cT\u003e`._\n\n### Key Types and Interfaces\n\n#### CSVError\nCustom error class for all library-specific errors.\n- `message: string` - Error description.\n- `cause?: unknown` - The original error, if any, that led to this `CSVError`.\n\n#### Options Interfaces\n\n-   **`CSVReadOptions\u003cT\u003e`**: Configures CSV reading operations.\n    -   `fsOptions?`: NodeJS file system options.\n    -   `csvOptions?`: Options for `csv-parse` (e.g., `delimiter`, `quote`, `skip_empty_lines`). Default: `{ columns: true }`.\n    -   `transform?: (content: string) =\u003e string`: Pre-parsing transform for raw file content.\n    -   `headerMap?: HeaderMap\u003cT\u003e`: Configuration for mapping CSV columns to object properties (see [Header Mapping](#header-mapping)).\n    -   `retry?: RetryOptions`: Configuration for retrying failed read operations.\n    -   `validateData?: boolean`: Basic structural validation of parsed data.\n    -   `schema?: CSVSchemaConfig\u003cT\u003e`: Configuration for data validation against schemas (see [Schema Validation](#schema-validation)).\n    -   `saveAdditionalHeader?: boolean | number`: Extracts initial lines as a preamble (see [Preamble Handling](#preamble-handling)).\n    -   `additionalHeaderParseOptions?`: `csv-parse` options specifically for parsing the preamble.\n    -   `customCasts?`: Configuration for advanced type casting (see [Custom Type Casting](#custom-type-casting)).\n        -   `definitions?: CustomCastDefinition`: Global named casters.\n        -   `columnCasts?: ColumnCastConfig\u003cT\u003e`: Per-column casting rules.\n        -   `onCastError?: 'error' | 'null' | 'original'`: Behavior on casting failure.\n\n-   **`CSVWriteOptions\u003cT\u003e`**: Configures CSV writing operations.\n    -   `additionalHeader?: string`: String to prepend to the CSV output (e.g., comments, metadata).\n    -   `stringifyOptions?`: Options for `csv-stringify` (e.g., `header`, `delimiter`, `quoted`). Default: `{ header: true }`.\n    -   `streaming?: boolean`: Whether to use streaming for writing large datasets.\n    -   `headerMap?: HeaderMap\u003cT\u003e`: Configuration for mapping object properties to CSV columns.\n    -   `streamingThreshold?: number`: Row count threshold to enable streaming (default: 1000).\n    -   `retry?: RetryOptions`: Configuration for retrying failed write operations.\n\n-   **`CSVStreamOptions\u003cT\u003e`**: Configures generator-based stream processing.\n    -   `csvOptions?`: Options for `csv-parse`. Default: `{ columns: true }`.\n    -   `transform?: (row: any) =\u003e T`: Function to transform each parsed row.\n    -   `batchSize?: number`: Number of rows per batch for `csvBatchGenerator` (default: 100).\n    -   `headerMap?: HeaderMap\u003cT\u003e`: Header mapping configuration.\n    -   `retry?: RetryOptions`: Retry configuration (applies if underlying operations support it).\n\n-   **`RetryOptions`**: Configures retry behavior.\n    -   `maxRetries?: number`: Max retry attempts (default: 3).\n    -   `baseDelay?: number`: Initial delay in ms (default: 100), uses exponential backoff.\n    -   `logRetries?: boolean`: Log retries to `console.warn` (default: false).\n\n#### Casting Related Types\n-   **`Caster\u003cTargetType\u003e`**: Defines a custom type caster.\n    -   `test: (value: string, context: CastingContext) =\u003e boolean`: Returns `true` if this caster should handle the `value`.\n    -   `parse: (value: string, context: CastingContext) =\u003e TargetType`: Parses the `value` to `TargetType`. Throws on error.\n-   **`CustomCastDefinition`**: A map of type names (e.g., 'string', 'number', 'date') to `Caster` objects for global definitions.\n-   **`ColumnCastConfig\u003cT\u003e`**: Per-column casting rules, mapping column names to caster keys (from `definitions`) or direct `Caster` objects, or an array of these to try in order.\n-   **`CastingContext`**: Provides context (column name, line number, etc.) to caster functions.\n\n#### Schema Related Types\n-   **`CSVSchemaConfig\u003cT\u003e`**: Configures schema-based validation.\n    -   `rowSchema?: StandardSchemaV1`: Schema applied to each entire row object (e.g., a Zod schema).\n    -   `columnSchemas?: { [K in keyof T]?: StandardSchemaV1 } | { [col: string]: StandardSchemaV1 }`: Schemas applied to individual column values before row validation.\n    -   `validationMode?: 'error' | 'filter' | 'keep'`: Action on validation failure (default: 'error').\n    -   `useAsync?: boolean`: Set to `true` if schemas involve asynchronous validation logic (default: `false` for sync methods, `true` for async methods if schema present).\n-   **`RowValidationResult\u003cT\u003e`**: Contains results of validating a single row.\n    -   `originalRow: Record\u003cstring, any\u003e`: The row before validation.\n    -   `validatedRow?: T`: The row after successful validation and type coercion by schema.\n    -   `valid: boolean`: Overall validity of the row.\n    -   `rowIssues?: StandardSchemaV1.Issue[]`: Issues from `rowSchema` validation.\n    -   `columnIssues?: Record\u003cstring, StandardSchemaV1.Issue[]\u003e`: Issues from `columnSchemas` validation.\n-   **`StandardSchemaV1`**: Interface for schema objects compatible with the Standard Schema specification (useful for integrating with Zod, Yup, etc., or custom validation).\n\n#### Other Types\n-   **`HeaderMap\u003cT\u003e`**: An object defining mapping rules between CSV headers (or array indices) and object property paths. Can include `CsvToArrayConfig` or `ObjectArrayToCsvConfig` for array mappings.\n-   **`CsvToArrayConfig`**: Special `HeaderMap` entry to map multiple CSV columns to a single array property.\n-   **`ObjectArrayToCsvConfig`**: Special `HeaderMap` entry to map an array property to multiple CSV columns.\n-   **`SimilarityMatch\u003cT\u003e`**: Result of `findSimilarRows`, containing the `row: T` and Levenshtein `dist: number`.\n-   **`ValueTransformFn` (MergeFn in README context)**: `(currentObject: Partial\u003cT\u003e, targetPath: string, sourceValue: any, sourceKeyOrIndex: string | number, allSourceData: any) =\u003e any`. A function type used within `createHeaderMapFns` to allow custom transformation of values during the mapping process from CSV source to target object structure. The README describes it as: `(obj: Partial\u003cT\u003e, key: string, value: any) =\u003e any` which is a simplified signature for its common use case.\n\n## Memory-Efficient Stream Processing with `CSVStreamProcessor`\n\nFor very large CSV files that don't fit into memory, `CSVStreamProcessor` provides a fluent, chainable API for stream-based transformations.\n\n### Creating a Stream Processor\n```typescript\nimport CSV from '@doeixd/csv-utils';\ninterface OrderData { /* ... define your expected row structure ... */ }\n\n// Create a stream processor from a file\nconst processor = CSV.streamFromFile\u003cOrderData\u003e('very_large_orders.csv', {\n  // CSVReadOptions can be provided, e.g., csvOptions for parsing, headerMap\n  csvOptions: { delimiter: ';', trim: true },\n  headerMap: { 'Order ID': 'id', 'Customer Name': 'customer' /* ... */ }\n});\n```\n**Note on Preamble:** `CSV.streamFromFile` does **not** handle `saveAdditionalHeader` or `additionalHeaderParseOptions` from `CSVReadOptions`. It starts processing directly from the data rows as configured by `csvOptions.from_line` (or line 1 if not set).\n\n### Fluent Stream Transformations\nChain operations like `filter`, `map`, `addColumn` just like the main `CSV` class. Each returns a new `CSVStreamProcessor` instance.\n\n```typescript\nconst processedStream = processor\n  .filter(order =\u003e order.status === 'COMPLETED' \u0026\u0026 parseFloat(order.totalValue) \u003e 1000)\n  .map(order =\u003e ({\n    orderId: order.id,\n    customerName: order.customer,\n    value: parseFloat(order.totalValue),\n    processedDate: new Date()\n  }))\n  .addColumn('isHighValue', order =\u003e order.value \u003e 5000);\n```\n\nThe `CSVStreamProcessor` uses an internal fixed-size circular buffer and automatic backpressure management to control memory usage, making it suitable for processing files of virtually any size.\n\n### Executing the Stream Pipeline\n\nThe pipeline is executed when a terminal operation is called:\n\n1.  **Async Iteration (`for await...of`)**: Most common and memory-efficient way.\n    ```typescript\n    for await (const processedOrder of processedStream) {\n      // console.log(processedOrder.orderId, processedOrder.customerName);\n      // await saveToDatabase(processedOrder);\n    }\n    ```\n\n2.  **`run()` with a Preparatory Method**: Configure a terminal action, then execute.\n    -   **Collect into `CSV` instance**:\n        ```typescript\n        // Loads all results into memory - use with caution on huge files!\n        const collectedCsv: CSV\u003cProcessedOrderType\u003e = await processedStream.prepareCollect().run() as CSV\u003cProcessedOrderType\u003e;\n        ```\n    -   **Write to File**:\n        ```typescript\n        await processedStream.prepareToFile('processed_large_orders.csv', {\n          // CSVWriteOptions, e.g., stringifyOptions\n          stringifyOptions: { header: true, bom: true }\n        }).run();\n        ```\n    -   **Execute Callback for Each Row**:\n        ```typescript\n        await processedStream.prepareForEach(async (row) =\u003e {\n          // await sendNotification(row);\n        }).run();\n        ```\n    -   **Pipe to another Writable Stream**:\n        ```typescript\n        import fs from 'node:fs';\n        const myWritable = fs.createWriteStream('output.log');\n        await processedStream.preparePipeTo(myWritable).run();\n        ```\n\n3.  **`pipe()` method**: Directly pipe to a Writable stream (terminal operation).\n    ```typescript\n    import fs from 'node:fs';\n    const anotherWritable = fs.createWriteStream('output_direct_pipe.txt');\n    processedStream.pipe(anotherWritable); // Returns 'anotherWritable'\n    // Listen for 'finish' or 'error' on anotherWritable\n    anotherWritable.on('finish', () =\u003e console.log('Direct pipe finished.'));\n    ```\n\n## Troubleshooting\n\n### Default Options\n\nBy default, all CSV reading methods (`fromString`, `fromFile`, and `fromStream`) set the following options for the underlying `csv-parse` library if not otherwise specified:\n\n- `columns: true` - CSV data is parsed into objects with column headers as keys. This is essential for most object-based operations in the library.\n\nYou can override these defaults by providing your own options in the `csvOptions` property:\n\n```typescript\n// Override the default columns setting\nconst rawData = CSV.fromString(csvContent, {\n  csvOptions: { columns: false, delimiter: ';' } // Results in arrays of strings\n});\n\n// Use all the defaults (columns: true is applied automatically)\nconst data = CSV.fromFile('data.csv');\n```\n\n### Important Note: Mutability and Query Results\n\nThis library is not meant to be used in the browser, and used nodejs apis. \n\nMany of the query methods in this library, such as `findRowWhere` and `findRowsWhere`, **return direct references to the objects within the CSV data**, rather than creating new copies. This design choice enhances performance and enables efficient in-place modifications, but it also introduces a potential pitfall.\n\n**Benefits of Mutability:**\n\n*   **Performance:** Avoids the overhead of creating new objects for each query result, which can be significant for large datasets.\n*   **In-Place Modification:** Allows you to directly modify the data within the CSV instance without the need for additional assignment or update operations. This can simplify certain data manipulation workflows.\n\n**The \"Foot Gun\": Potential Pitfalls:**\n\n*   **Unintended Side Effects:** If you modify an object returned by a query method, you are directly changing the underlying data within the CSV instance. This can lead to unexpected side effects if other parts of your code are relying on the original state of the data.\n*   **Unexpected Results:** Subsequent queries or operations might be affected by these in-place modifications.\n\n**Example Illustrating the Issue:**\n\n```typescript\nimport CSV from '@doeixd/csv-utils';\n\ninterface User { id: number; name: string; active: boolean; }\n\nconst csv = CSV.fromData\u003cUser\u003e([\n  { id: 1, name: 'Alice', active: true },\n  { id: 2, name: 'Bob', active: false },\n  { id: 3, name: 'Carol', active: true }\n]);\n\n// Find the first inactive user\nconst inactiveUser = csv.findRowWhere(user =\u003e !user.active);\n\n// Directly modify the object returned by findRowWhere\nif (inactiveUser) {\n  inactiveUser.active = true; // **DANGER: Modifies the underlying CSV data!**\n}\n\n// Now the CSV instance has been modified!\nconst activeUsers = csv.findRowsWhere(user =\u003e user.active);\nconsole.log(activeUsers.length); // 3 (Bob is now considered active)\n```\n\n**How to Avoid Pitfalls (Best Practices):**\n\n*   **Clone Before Modifying:** To prevent unintended side effects, always clone the object returned by query methods before making any modifications. Use `CSVUtils.clone` for a deep copy:\n\n    ```typescript\n    import CSV, { CSVUtils } from '@doeixd/csv-utils';\n\n    const inactiveUser = csv.findRowWhere(user =\u003e !user.active);\n\n    if (inactiveUser) {\n      const clonedUser = CSVUtils.clone(inactiveUser); // Create a deep copy\n      clonedUser.active = true; // Modify the clone, not the original\n      // ... do something with clonedUser, but don't re-insert it into the CSV\n    }\n\n    // Original CSV instance remains unchanged\n    const activeUsers = csv.findRowsWhere(user =\u003e user.active);\n    console.log(activeUsers.length); // 2 (Bob is still considered inactive)\n    ```\n\n*   **Use `updateWhere` for Bulk Updates:** If you need to update multiple rows based on a condition, use the `updateWhere` method. This ensures that new objects are created, avoiding direct mutation of the original data:\n\n    ```typescript\n    import CSV from '@doeixd/csv-utils';\n\n    const updatedCsv = csv.updateWhere(\n      user =\u003e !user.active,\n      { active: true }  // This creates new objects, not mutating existing ones\n    );\n\n    // The original CSV instance remains unchanged\n    const stillInactive = csv.findRowWhere(user =\u003e !user.active); // May still exist\n    // But updatedCsv contains *new* objects\n    const updatedActive = updatedCsv.findRowsWhere(user =\u003e user.active); // Will contain Alice, Carol, and a new Bob\n    ```\n\nBy being aware of this mutability characteristic and following these best practices, you can effectively leverage the power of this library while avoiding potential issues.\n\n### Common Issues\n\n-   **Inconsistent Row Lengths / Malformed CSV**:\n    -   Error like: `Error: Row length mismatch at line 42...` or `Invalid Record Length`.\n    -   **Solution**: Check your CSV for unescaped quotes, incorrect delimiters, or missing fields. Ensure `csvOptions.delimiter` matches your file. For debugging, you can use `CSVReadOptions.transform` to log raw content. If structural errors are expected and should be ignored (at risk of data issues), underlying `csv-parse` options like `relax_column_count: true` might be used in `csvOptions`, though this library emphasizes strictness by default.\n\n-   **Type Casting Failures**:\n    -   Error like: `Custom cast failed for column \"price\"...` or values not being the expected type.\n    -   **Solution**: Review `customCasts` definitions. Ensure `test` functions are specific enough and `parse` functions handle edge cases. Use `onCastError: 'null'` or `'original'` to prevent errors and inspect problematic values.\n\n-   **Performance with Large Files**:\n    -   Slow processing or high memory usage.\n    -   **Solution**:\n        -   For reading/transforming: Use `CSV.streamFromFile()` to get a `CSVStreamProcessor` and process data via `for await...of` or `prepareForEach().run()`.\n        -   For reading only (less transformation): Use `csvGenerator()` or `csvBatchGenerator()`.\n        -   For writing: `CSV.writeToFile()` uses streaming for large datasets by default (`streamingThreshold`). `CSV.writeToFileAsync()` and `writeCSVFromGenerator()` are also good options.\n        -   For CPU-bound tasks on arrays of data: `CSVUtils.processInParallel()`.\n\n-   **Header Mapping Not Working as Expected**:\n    -   Properties are `undefined` or not mapped correctly.\n    -   **Solution**:\n        -   Ensure `HeaderMap` keys exactly match CSV headers (case-sensitive by default, unless `csvOptions.columns` is a function that normalizes them).\n        -   Verify object paths in `HeaderMap` values are correct.\n        -   For `targetArrayToCsv`, the key in `HeaderMap` must be the name of the array property in your source objects.\n        -   For `csvToTargetArray`, the `sourceCsvColumnPattern` or `sourceCsvColumns` must correctly identify the columns in the CSV file.\n\n-   **Schema Validation Errors**:\n    -   `CSVError: CSV validation failed...`\n    -   **Solution**: Check schema definitions (e.g., Zod schemas, `StandardSchemaV1` implementations). If using `validationMode: 'keep'`, inspect `csvInstance.validationResults` for detailed error messages per row/column. Ensure data types are what the schema expects (e.g., use `customCasts` to convert strings to numbers/dates before schema validation if needed).\n\n-   **Preamble Not Captured**:\n    -   `csvInstance.additionalHeader` is empty.\n    -   **Solution**: Ensure `saveAdditionalHeader: true` (or a number) is set in `CSVReadOptions`. If `saveAdditionalHeader: true`, `csvOptions.from_line` must be greater than 1. The preamble consists of lines *before* `from_line`.\n\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. Ensure that your contributions include relevant tests and documentation updates.\n\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n(Assuming you have a LICENSE file, if not, you can generate one, e.g. from choosealicense.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoeixd%2Fcsv-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdoeixd%2Fcsv-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdoeixd%2Fcsv-utils/lists"}