An open API service indexing awesome lists of open source software.

https://github.com/kamiazya/web-csv-toolbox

🌊A CSV Toolbox utilizing Web Standard APIs.
https://github.com/kamiazya/web-csv-toolbox

csv hacktoberfest web-streams

Last synced: 3 months ago
JSON representation

🌊A CSV Toolbox utilizing Web Standard APIs.

Awesome Lists containing this project

README

          

[![npm version](https://badge.fury.io/js/web-csv-toolbox.svg)](https://badge.fury.io/js/web-csv-toolbox)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
![node version](https://img.shields.io/node/v/web-csv-toolbox)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox?ref=badge_shield)

![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/kamiazya/web-csv-toolbox)
![npm](https://img.shields.io/npm/dm/web-csv-toolbox)
[![codecov](https://codecov.io/gh/kamiazya/web-csv-toolbox/graph/badge.svg?token=8RbDcXHTFl)](https://codecov.io/gh/kamiazya/web-csv-toolbox)

# `🌐 web-csv-toolbox 🧰`

A CSV Toolbox utilizing Web Standard APIs.

πŸ”—

[![GitHub](https://img.shields.io/badge/-GitHub-181717?logo=GitHub&style=flat)](https://github.com/kamiazya/web-csv-toolbox)
[![npm](https://img.shields.io/badge/-npm-CB3837?logo=npm&style=flat)](https://www.npmjs.com/package/web-csv-toolbox)
[![API Reference](https://img.shields.io/badge/-API%20Reference-3178C6?logo=TypeScript&style=flat&logoColor=fff)](https://kamiazya.github.io/web-csv-toolbox/)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/kamiazya/web-csv-toolbox)
[![Sponsor](https://img.shields.io/badge/-GitHub%20Sponsor-fff?logo=GitHub%20Sponsors&style=flat)](https://github.com/sponsors/kamiazya)
[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/kamiazya/web-csv-toolbox)

[![format: Biome](https://img.shields.io/badge/format%20with-Biome-F7B911?logo=biome&style=flat)](https://biomejs.dev/)
[![test: Vitest](https://img.shields.io/badge/tested%20with-Vitest-6E9F18?logo=vitest&style=flat)](https://vitest.dev/)
[![build: Vite](https://img.shields.io/badge/build%20with-Vite-646CFF?logo=vite&style=flat)](https://rollupjs.org/)

GMO OSS support

---

## Key Concepts ✨

- 🌐 **Web Standards first.**
- Utilizing the Web Standards APIs, such as the [Web Streams API](https://developer.mozilla.org/en/docs/Web/API/Streams_API).
- ❀️ **TypeScript friendly & User friendly.**
- Fully typed and documented.
- 0️⃣ **Zero dependencies.**
- Using only Web Standards APIs.
- πŸ’ͺ **Property-based testing.**
- Using [fast-check](https://fast-check.dev/) and [vitest](https://vitest.dev).
- βœ… **Cross-platform.**
- Works on browsers, Node.js, and Deno.

## Key Features πŸ“—

- 🌊 **Efficient CSV Parsing with Streams**
- πŸ’» Leveraging the [WHATWG Streams API](https://streams.spec.whatwg.org/) and other Web APIs for seamless and efficient data processing.
- πŸ›‘ **AbortSignal and Timeout Support**: Ensure your CSV processing is cancellable, including support for automatic timeouts.
- βœ‹ Integrate with [`AbortController`](https://developer.mozilla.org/docs/Web/API/AbortController) to manually cancel operations as needed.
- ⏳ Use [`AbortSignal.timeout`](https://developer.mozilla.org/docs/Web/API/AbortSignal/timeout_static) to automatically cancel operations that exceed a specified time limit.
- πŸ›‘οΈ **Memory Safety Protection**: Built-in limits prevent memory exhaustion attacks.
- πŸ”’ Configurable maximum buffer size (default: 10M characters) to prevent DoS attacks via unbounded input.
- 🚨 Throws `RangeError` when buffer exceeds the limit.
- πŸ“Š Configurable maximum field count (default: 100,000 fields/record) to prevent excessive column attacks.
- ⚠️ Throws `RangeError` when field count exceeds the limit.
- πŸ’Ύ Configurable maximum binary size (default: 100MB bytes) for BufferSource inputs.
- πŸ›‘ Throws `RangeError` when binary size exceeds the limit.
- 🎨 **Flexible Source Support**
- 🧩 Parse CSVs directly from `string`s, `ReadableStream`s, `Response` objects, `Blob`/`File` objects, or `Request` objects.
- βš™οΈ **Advanced Parsing Options**: Customize your experience with various delimiters and quotation marks.
- πŸ”„ Defaults to `,` and `"` respectively.
- πŸ’Ύ **Specialized Binary CSV Parsing**: Leverage Stream-based processing for versatility and strength.
- πŸ”„ Flexible BOM handling.
- πŸ—œοΈ Supports various compression formats.
- πŸ”€ Charset specification for diverse encoding.
- πŸš€ **Using WebAssembly for High Performance**: WebAssembly is used for high performance parsing. (_Experimental_)
- πŸ“¦ WebAssembly is used for high performance parsing.
- ⚠️ **Experimental**: WASM automatic initialization (base64-embedded) is experimental and may change in future versions.
- πŸ“¦ **Lightweight and Zero Dependencies**: No external dependencies, only Web Standards APIs.
- πŸ“š **Fully Typed and Documented**: Fully typed and documented with [TypeDoc](https://typedoc.org/).

## Installation πŸ“₯

### With Package manager πŸ“¦

This package can then be installed using a package manager.

```sh
# Install with npm
$ npm install web-csv-toolbox
# Or Yarn
$ yarn add web-csv-toolbox
# Or pnpm
$ pnpm add web-csv-toolbox
```

### From CDN (unpkg.com) 🌐

```html

import { parse } from 'https://unpkg.com/web-csv-toolbox';

const csv = `name,age
Alice,42
Bob,69`;

for await (const record of parse(csv)) {
console.log(record);
}

```

#### Deno πŸ¦•

You can install and use the package by specifying the following:

```js
import { parse } from "npm:web-csv-toolbox";
```

## Entry Points πŸšͺ

This library provides two entry points to suit different needs:

For a deeper comparison and migration guidance, see:

- docs/explanation/main-vs-slim.md

### `web-csv-toolbox` (Default - Full Features)

**Best for**: Most users who want automatic WASM initialization and all features

```typescript
import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox';

// Optional but recommended: preload to reduce first‑parse latency
await loadWASM();
const records = parseStringToArraySyncWASM(csv);
```

**Characteristics:**
- βœ… Full features including synchronous WASM APIs
- βœ… Automatic WASM initialization on first use (not at import time)
- πŸ’‘ Call `loadWASM()` at startup to reduce first‑parse latency (optional)
- ⚠️ **Experimental**: WASM auto-init embeds WASM as base64, may change in future
- ⚠️ Larger bundle size (WASM embedded in main bundle)

### `web-csv-toolbox/slim` (Slim Entry - Smaller Bundle)

**Best for**: Bundle size-sensitive applications and production optimization

```typescript
import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox/slim';

// Manual initialization required
await loadWASM();
const records = parseStringToArraySyncWASM(csv);
```

**Characteristics:**
- βœ… Smaller main bundle (WASM not embedded)
- βœ… External WASM loading for better caching
- βœ… Explicit control over initialization timing
- ❌ Requires manual `loadWASM()` call before using WASM features

**Comparison:**

| Aspect | Main | Slim |
|--------|------|------|
| **Initialization** | Automatic | Manual (`loadWASM()` required) |
| **Bundle Size** | Larger (WASM embedded) | Smaller (WASM external) |
| **Caching** | Single bundle | WASM cached separately |
| **Use Case** | Convenience, prototyping | Production, bundle optimization |

> **Note**: Both entry points export the same full API (feature parity). The only difference is WASM initialization strategy and bundle size.

## Usage πŸ“˜

> **Note for Bundler Users**: When using Worker-based execution strategies (e.g., `EnginePresets.responsive()`, `EnginePresets.responsiveFast()`) with bundlers like Vite or Webpack, you must explicitly specify the `workerURL` option. See the [Bundler Integration Guide](./docs/how-to-guides/using-with-bundlers.md) for configuration details.

### Parsing CSV files from strings

```js
import { parse } from 'web-csv-toolbox';

const csv = `name,age
Alice,42
Bob,69`;

for await (const record of parse(csv)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
```

### Parsing CSV files from `ReadableStream`s

```js
import { parse } from 'web-csv-toolbox';

const csv = `name,age
Alice,42
Bob,69`;

const stream = new ReadableStream({
start(controller) {
controller.enqueue(csv);
controller.close();
},
});

for await (const record of parse(stream)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
```

### Parsing CSV files from `Response` objects

```js
import { parse } from 'web-csv-toolbox';

const response = await fetch('https://example.com/data.csv');

for await (const record of parse(response)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
```

### Parsing CSV files from `Blob` or `File` objects

```js
import { parse } from 'web-csv-toolbox';

// From file input
const fileInput = document.querySelector('input[type="file"]');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];

for await (const record of parse(file)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
});
```

### Parsing CSV files from `Request` objects (Server-side)

```js
import { parse } from 'web-csv-toolbox';

// Cloudflare Workers / Service Workers
export default {
async fetch(request) {
if (request.method === 'POST') {
for await (const record of parse(request)) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }

return new Response('OK', { status: 200 });
}
}
};
```

### Parsing CSV files with different delimiters and quotation characters

```js
import { parse } from 'web-csv-toolbox';

const csv = `name\tage
Alice\t42
Bob\t69`;

for await (const record of parse(csv, { delimiter: '\t' })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
```

### Parsing CSV files with headers

```js
import { parse } from 'web-csv-toolbox';

const csv = `Alice,42
Bob,69`;

for await (const record of parse(csv, { header: ['name', 'age'] })) {
console.log(record);
}
// Prints:
// { name: 'Alice', age: '42' }
// { name: 'Bob', age: '69' }
```

### Working with Headerless CSV Files

Some CSV files don’t include a header row. You can provide custom headers manually:

```typescript
import { parse } from 'web-csv-toolbox';

// Example: Sensor data without headers
const sensorData = `25.5,60,1024
26.1,58,1020
24.8,62,1025`;

// Provide headers explicitly
for await (const record of parse(sensorData, {
header: ['temperature', 'humidity', 'pressure']
})) {
console.log(`Temp: ${record.temperature}Β°C, Humidity: ${record.humidity}%, Pressure: ${record.pressure} hPa`);
}
// Output:
// Temp: 25.5Β°C, Humidity: 60%, Pressure: 1024 hPa
// Temp: 26.1Β°C, Humidity: 58%, Pressure: 1020 hPa
// Temp: 24.8Β°C, Humidity: 62%, Pressure: 1025 hPa
```

### `AbortSignal` / `AbortController` Support

Support for [`AbortSignal`](https://developer.mozilla.org/docs/Web/API/AbortSignal) / [`AbortController`](https://developer.mozilla.org/docs/Web/API/AbortController), enabling you to cancel ongoing asynchronous CSV processing tasks.

This feature is useful for scenarios where processing needs to be halted, such as when a user navigates away from the page or other conditions that require stopping the task early.

#### Example Use Case: Abort with user action

```js
import { parse } from 'web-csv-toolbox';

const controller = new AbortController();
const csv = "name,age\nAlice,30\nBob,25";

try {
// Parse the CSV data then pass the AbortSignal to the parse function
for await (const record of parse(csv, { signal: controller.signal })) {
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
// The CSV processing was aborted by the user
console.log('CSV processing was aborted by the user.');
} else {
// An error occurred during CSV processing
console.error('An error occurred:', error);
}
}

// Some abort logic, like a cancel button
document.getElementById('cancel-button')
.addEventListener('click', () => {
controller.abort();
});
```

#### Example Use Case: Abort with timeout

```js
import { parse } from 'web-csv-toolbox';

// Set up a timeout of 5 seconds (5000 milliseconds)
const signal = AbortSignal.timeout(5000);

const csv = "name,age\nAlice,30\nBob,25";

try {
// Pass the AbortSignal to the parse function
const result = await parse.toArray(csv, { signal });
console.log(result);
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle the case where the processing was aborted due to timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
```

## Supported Runtimes πŸ’»

### Works on Node.js

| Versions | Status |
| -------- | ------ |
| 20.x | βœ… |
| 22.x | βœ… |
| 24.x | βœ… |

> Note: For Node environments, the WASM loader uses `import.meta.resolve`. Node.js 20.6+ is recommended. On older Node versions, pass an explicit URL/Buffer to `loadWASM()`.

### Works on Browser

| OS | Chrome | Firefox | Default |
| ------- | ------ | ------- | ------------- |
| Windows | βœ… | βœ… | βœ… (Edge) |
| macOS | βœ… | βœ… | ⬜ (Safari *) |
| Linux | βœ… | βœ… | - |

> **\* Safari**: Basic functionality is expected to work, but it is not yet automatically tested in our CI environment.

### Others

- Verify that JavaScript is executable on the Deno. [![Deno CI](https://github.com/kamiazya/web-csv-toolbox/actions/workflows/deno.yaml/badge.svg)](https://github.com/kamiazya/web-csv-toolbox/actions/workflows/deno.yaml)

### Platform-Specific Usage Guide πŸ“š

For detailed examples and best practices for your specific runtime environment, see:

**[Platform-Specific Usage Guide](./docs/how-to-guides/platform-usage/)**

This guide covers:
- 🌐 **Browser**: File input, drag-and-drop, Clipboard API, FormData, Fetch API
- 🟒 **Node.js**: Buffer, fs.ReadStream, HTTP requests, stdin/stdout
- πŸ¦• **Deno**: Deno.readFile, Deno.open, fetch API
- ⚑ **Edge**: Cloudflare Workers, Deno Deploy, Vercel Edge Functions
- 🐰 **Bun**: File API, HTTP server

## APIs πŸ§‘β€πŸ’»

### High-level APIs πŸš€

These APIs are designed for **Simplicity and Ease of Use**,
providing an intuitive and straightforward experience for users.

- **`function parse(input[, options]): AsyncIterableIterator`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parse-1.html)
- Parses various CSV input formats into an asynchronous iterable of records.
- **`function parse.toArray(input[, options]): Promise`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parse.toArray.html)
- Parses CSV input into an array of records, ideal for smaller data sets.

The `input` paramater can be:
- a `string`
- a [ReadableStream](https://developer.mozilla.org/docs/Web/API/ReadableStream) of `string`s or [Uint8Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array)s
- a [BufferSource](https://webidl.spec.whatwg.org/#BufferSource) ([Uint8Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array), [ArrayBuffer](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer), or other [TypedArray](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/TypedArray))
- a [Response](https://developer.mozilla.org/docs/Web/API/Response) object
- a [Blob](https://developer.mozilla.org/docs/Web/API/Blob) or [File](https://developer.mozilla.org/docs/Web/API/File) object
- a [Request](https://developer.mozilla.org/docs/Web/API/Request) object (server-side)

### Middle-level APIs 🧱

These APIs are optimized for **Enhanced Performance and Control**,
catering to users who need more detailed and fine-tuned functionality.

- **`function parseString(string[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseString-1.html)
- Efficient parsing of CSV strings.
- **`function parseBinary(buffer[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseBinary-1.html)
- Parse CSV binary data from BufferSource (Uint8Array, ArrayBuffer, or other TypedArray).
- **`function parseResponse(response[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseResponse-1.html)
- Customized parsing directly from `Response` objects.
- **`function parseRequest(request[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseRequest-1.html)
- Server-side parsing from `Request` objects (Cloudflare Workers, Service Workers, etc.).
- **`function parseBlob(blob[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseBlob-1.html)
- Parse CSV data from `Blob` or `File` objects.
- **`function parseFile(file[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseFile-1.html)
- Parse `File` objects with automatic filename tracking in error messages.
- **`function parseStream(stream[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseStream-1.html)
- Stream-based parsing for larger or continuous data.
- **`function parseStringStream(stream[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseStringStream-1.html)
- Combines string-based parsing with stream processing.
- **`function parseBinaryStream(stream[, options])`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseBinaryStream-1.html)
- Parses binary streams with precise control over data types.

### Low-level APIs βš™οΈ

These APIs are built for **Advanced Customization and Pipeline Design**,
ideal for developers looking for in-depth control and flexibility.

The low-level APIs follow a 3-tier architecture:

#### Parser Models (Tier 1: Simplified Composition)

Combines Lexer and Assembler for streamlined usage without sacrificing flexibility.

- **`function createStringCSVParser(options?)`**
- Factory function for creating format-specific CSV parsers.
- Returns `FlexibleStringObjectCSVParser` (default) or `FlexibleStringArrayCSVParser` based on `outputFormat` option.
- Parses CSV strings by composing `FlexibleStringCSVLexer` and CSV Record Assembler.
- Stateful parser maintains internal lexer and assembler instances for streaming.
- Use with `StringCSVParserStream` for streaming workflows.
- **Low-level API**: Accepts `CSVProcessingOptions` only (no `engine` option).
- **Streaming mode**: When using `parse(chunk, { stream: true })`, you must call `parse()` without arguments at the end to flush any remaining data.

```typescript
// Object format (default)
const objectParser = createStringCSVParser({
header: ['name', 'age']
});

// Array format
const arrayParser = createStringCSVParser({
header: ['name', 'age'],
outputFormat: 'array'
});

// Process chunks
const records1 = objectParser.parse('Alice,30\nBob,', { stream: true });
const records2 = objectParser.parse('25\nCharlie,', { stream: true });

// Flush remaining data (required!)
const records3 = objectParser.parse();
```

- **Direct class usage**:
- `FlexibleStringObjectCSVParser` - Always outputs object records
- `FlexibleStringArrayCSVParser` - Always outputs array records

- **`function createBinaryCSVParser(options?)`**
- Factory function for creating format-specific binary CSV parsers.
- Returns `FlexibleBinaryObjectCSVParser` (default) or `FlexibleBinaryArrayCSVParser` based on `outputFormat` option.
- Parses binary CSV data (BufferSource: Uint8Array, ArrayBuffer, or other TypedArray) by composing `TextDecoder` with string CSV parser.
- Uses `TextDecoder` with `stream: true` option for proper multi-byte character handling across chunk boundaries.
- Supports various character encodings (utf-8, shift_jis, etc.) via `charset` option.
- BOM handling via `ignoreBOM` option, fatal error mode via `fatal` option.
- Use with `BinaryCSVParserStream` for streaming workflows.
- **Low-level API**: Accepts `BinaryCSVProcessingOptions` only (no `engine` option).
- **Streaming mode**: When using `parse(chunk, { stream: true })`, you must call `parse()` without arguments at the end to flush TextDecoder and parser buffers.

```typescript
// Object format (default)
const objectParser = createBinaryCSVParser({
header: ['name', 'age'],
charset: 'utf-8'
});

// Array format
const arrayParser = createBinaryCSVParser({
header: ['name', 'age'],
outputFormat: 'array',
charset: 'utf-8'
});

const encoder = new TextEncoder();

// Process chunks
const records1 = objectParser.parse(encoder.encode('Alice,30\nBob,'), { stream: true });
const records2 = objectParser.parse(encoder.encode('25\n'), { stream: true });

// Flush remaining data (required!)
const records3 = objectParser.parse();
```

- **Direct class usage**:
- `FlexibleBinaryObjectCSVParser` - Always outputs object records
- `FlexibleBinaryArrayCSVParser` - Always outputs array records

#### Lexer (Tier 2: Stage 1 - Tokenization)

Low-level tokenization with full control over CSV syntax.

- **`function createStringCSVLexer(options?)` / `class FlexibleStringCSVLexer`**
- Factory helper plus underlying class for the standalone lexer used across the toolkit.
- Configure delimiters, quotation, buffer limits, and cancellation per stream.
- Returns tokens (field values, row delimiters, etc.) for manual processing.

#### Assembler (Tier 2: Stage 2 - Record Assembly)

Converts tokens into structured records with flexible formatting.

- **`function createCSVRecordAssembler(options)`**
- Factory that returns either an object- or array-format assembler based on `outputFormat`.
- Applies new options like `includeHeader` and `columnCountStrategy` consistently across environments.
- **`class FlexibleCSVObjectRecordAssembler` / `class FlexibleCSVArrayRecordAssembler`**
- Specialized assemblers when you need full control over object vs tuple output or want to extend behavior.
- `FlexibleCSVRecordAssembler` remains for backward compatibility but now delegates to these focused implementations.

#### Streaming Transformers (Tier 3: TransformStream Integration)

Web Streams API integration for all processing tiers.

- **`class StringCSVParserStream`**
- `TransformStream` for streaming string parsing.
- Wraps Parser instances (accepts parser in constructor, doesn't construct internally).
- Configurable backpressure handling via `backpressureCheckInterval` option.
- Custom queuing strategies support for fine-tuned performance.
- **`class BinaryCSVParserStream`**
- `TransformStream` for streaming binary parsing.
- Handles UTF-8 multi-byte characters across chunk boundaries.
- Integration-ready for fetch API and file streaming.
- Backpressure management with configurable check intervals.
- **`createStringCSVLexerTransformer()`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/createStringCSVLexerTransformer.html)
- Factory function to create a StringCSVLexerTransformer with customizable queuing strategies.
- **`createCSVRecordAssemblerTransformer()`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/createCSVRecordAssemblerTransformer.html)
- Factory function to create a CSVRecordAssemblerTransformer with cooperative backpressure support.

These factory functions are the recommended way to create transformer instances. They encapsulate internal lexer/assembler initialization and provide sensible defaults, insulating your code from internal implementation changes. Direct class instantiation (`new StringCSVLexerTransformer(customLexer)`) is only needed when injecting a custom lexer implementationβ€”see [Custom Lexer/Assembler](./docs/how-to-guides/custom-csv-parser.md#low-level-custom-lexerassembler) for advanced use cases.

#### Customizing Queuing Strategies

Both `createStringCSVLexerTransformer()` and `createCSVRecordAssemblerTransformer()` support custom queuing strategies following the Web Streams API pattern. Strategies are passed as function arguments with **data-type-aware size counting** and **configurable backpressure handling**.

**Function signature:**
```typescript
createStringCSVLexerTransformer(options?, streamOptions?, writableStrategy?, readableStrategy?)
createCSVRecordAssemblerTransformer(options?, streamOptions?, writableStrategy?, readableStrategy?)
```

**Default queuing strategies (starting points, not benchmarked):**
```typescript
// StringCSVLexerTransformer defaults
createStringCSVLexerTransformer(
{ delimiter: ',' }, // CSV options
{ backpressureCheckInterval: 100 }, // Check every 100 tokens
{
highWaterMark: 65536, // 64KB of characters
size: (chunk) => chunk.length, // Count by string length
},
new CountQueuingStrategy({ highWaterMark: 1024 }) // 1024 tokens
)

// CSVRecordAssemblerTransformer defaults
createCSVRecordAssemblerTransformer(
{ header: ['name', 'age'] }, // Assembler options
{ backpressureCheckInterval: 10 }, // Check every 10 records
new CountQueuingStrategy({ highWaterMark: 1024 }), // 1024 tokens
new CountQueuingStrategy({ highWaterMark: 256 }) // 256 records
)
```

**Key Features:**

🎯 **Smart Size Counting:**
- Character-based counting for string inputs (accurate memory tracking)
- Token-based counting between transformers (smooth pipeline flow)
- Record-based counting for output (intuitive and predictable)

⚑ **Cooperative Backpressure:**
- Monitors `controller.desiredSize` during processing
- Yields to event loop when backpressure detected
- Prevents blocking the main thread
- Critical for browser UI responsiveness

πŸ”§ **Tunable Backpressure Check Interval:**
- `backpressureCheckInterval` (in options): How often to check for backpressure (count-based)
- Lower values (5-25): More responsive, slight overhead
- Higher values (100-500): Less overhead, slower response
- Customize based on downstream consumer speed

> ⚠️ **Important**: These defaults are theoretical starting points based on data flow characteristics, **not empirical benchmarks**. Optimal values vary by runtime (browser/Node.js/Deno), file size, memory constraints, and CPU performance. **Profile your specific use case** to find the best values.

**When to customize:**
- πŸš€ **High-throughput servers**: Higher `highWaterMark` (128KB+, 2048+ tokens), higher `backpressureCheckInterval` (200-500)
- πŸ“± **Memory-constrained environments**: Lower `highWaterMark` (16KB, 256 tokens), lower `backpressureCheckInterval` (10-25)
- 🐌 **Slow consumers** (DB writes, API calls): Lower `highWaterMark`, lower `backpressureCheckInterval` for responsive backpressure
- πŸƒ **Fast processing**: Higher values to reduce overhead

**Example - High-throughput server:**
```typescript
import {
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';

const response = await fetch('large-dataset.csv');
await response.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(createStringCSVLexerTransformer(
{ delimiter: ',' },
{ backpressureCheckInterval: 200 }, // Less frequent checks
{
highWaterMark: 131072, // 128KB
size: (chunk) => chunk.length,
},
new CountQueuingStrategy({ highWaterMark: 2048 }) // 2048 tokens
))
.pipeThrough(createCSVRecordAssemblerTransformer(
{}, // Use default assembler options
{ backpressureCheckInterval: 20 }, // Less frequent checks
new CountQueuingStrategy({ highWaterMark: 2048 }), // 2048 tokens
new CountQueuingStrategy({ highWaterMark: 512 }) // 512 records
))
.pipeTo(yourRecordProcessor);
```

**Example - Slow consumer (API writes):**
```typescript
import {
createStringCSVLexerTransformer,
createCSVRecordAssemblerTransformer
} from 'web-csv-toolbox';

await csvStream
.pipeThrough(createStringCSVLexerTransformer()) // Use defaults
.pipeThrough(createCSVRecordAssemblerTransformer(
{}, // Use default assembler options
{ backpressureCheckInterval: 2 }, // Very responsive
new CountQueuingStrategy({ highWaterMark: 512 }),
new CountQueuingStrategy({ highWaterMark: 64 })
))
.pipeTo(new WritableStream({
async write(record) {
await fetch('/api/save', { method: 'POST', body: JSON.stringify(record) });
}
}));
```

**Benchmarking:**
Use the provided benchmark tool to find optimal values for your use case:
```bash
pnpm --filter web-csv-toolbox-benchmark queuing-strategy
```

See `benchmark/queuing-strategy.bench.ts` for implementation details.

### Experimental APIs πŸ§ͺ

These APIs are experimental and may change in the future.

#### Parsing using WebAssembly for high performance.

You can use WebAssembly to parse CSV data for high performance.

⚠️ **Experimental Notice**:
- WASM automatic initialization is experimental and may change in future versions
- Currently embeds WASM as base64 in the main bundle
- Future versions may change the loading strategy for better bundle size optimization

**WASM Limitations:**
- Parsing with WebAssembly is faster than parsing with JavaScript,
but it takes time to load the WebAssembly module.
- Supports only UTF-8 encoding csv data.
- Quotation characters are only `"`. (Double quotation mark)
- If you pass a different character, it will throw an error.
- Record output is always object-shaped; `outputFormat: 'array'` requires the JavaScript engine (`engine: { wasm: false }`).

```ts
import { loadWASM, parseStringToArraySyncWASM } from "web-csv-toolbox";

// load WebAssembly module
await loadWASM();

const csv = "a,b,c\n1,2,3";

// parse CSV string
const result = parseStringToArraySyncWASM(csv);
console.log(result);
// Prints:
// [{ a: "1", b: "2", c: "3" }]
```

- **`function loadWASM(): Promise`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/loadWASM.html)
- Loads the WebAssembly module.
- **`function parseStringToArraySyncWASM(string[, options]): CSVRecord[]`**: [πŸ“‘](https://kamiazya.github.io/web-csv-toolbox/functions/parseStringToArraySyncWASM.html)
- Parses CSV strings into an array of records.

## Options Configuration πŸ› οΈ

### Common Options βš™οΈ

| Option | Description | Default | Notes |
| ---------------- | ------------------------------------- | ------------ | ---------------------------------------------------------------------------------- |
| `delimiter` | Character to separate fields | `,` | |
| `quotation` | Character used for quoting fields | `"` | |
| `maxBufferSize` | Maximum internal buffer size (characters) | `10 * 1024 * 1024` | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input). Measured in UTF-16 code units. |
| `maxFieldCount` | Maximum fields allowed per record | `100000` | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input) |
| `header` | Custom headers for the parsed records | First row | If not provided, the first row is used as headers |
| `outputFormat` | Record shape (`'object'` or `'array'`) | `'object'` | `'array'` returns type-safe tuples; not available when running through WASM today |
| `includeHeader` | Emit header row when using array output | `false` | Only valid with `outputFormat: 'array'` β€” the header becomes the first emitted record |
| `columnCountStrategy` | Handle column-count mismatches when a header is provided | `'keep'` for array format / `'pad'` for object format | Choose between `keep`, `pad`, `strict`, or `truncate` to control how rows align with the header |
| `signal` | AbortSignal to cancel processing | `undefined` | Allows aborting of long-running operations |

#### Record Output Formats

High-level and mid-level parsers now let you choose whether records come back as objects (default) or as tuple-like arrays:

```ts
const header = ["name", "age"];

// Object output (default)
for await (const record of parse(csv, { header })) {
record.name; // string
}

// Array output with named tuples
const rows = await parse.toArray(csv, {
header,
outputFormat: "array",
includeHeader: true,
columnCountStrategy: "pad",
engine: { wasm: false }, // Array output currently runs on the JS engine only
});
// rows[0] === ['name', 'age'] (header row)
// rows[1] has type readonly [name: string, age: string]
```

- `outputFormat: 'object'` (default) returns familiar `{ column: value }` objects.
- `outputFormat: 'array'` returns readonly tuples whose indices inherit names from the header for stronger TypeScript inference.
- `includeHeader: true` prepends the header row when you also set `outputFormat: 'array'`.
- `columnCountStrategy` controls how rows with too many or too few columns are treated when a header is present:
- `keep`: emit rows exactly as they appear (default for array output with inferred headers)
- `pad`: fill short rows with `undefined` and truncate long rows (default for object output)
- `strict`: throw if the row length differs from the header
- `truncate`: discard columns beyond the header length without padding short rows

> ⚠️ Array output is not yet available inside the WebAssembly execution path. If you request `outputFormat: 'array'`, force the JavaScript engine with `engine: { wasm: false }` (or run in an environment where WASM is disabled).

### Advanced Options (Binary-Specific) 🧬

| Option | Description | Default | Notes |
| --------------------------------- | ------------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `charset` | Character encoding for binary CSV inputs | `utf-8` | See [Encoding API Compatibility](https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API/Encodings) for the encoding formats that can be specified. |
| `maxBinarySize` | Maximum binary size for BufferSource inputs (bytes) | `100 * 1024 * 1024` (100MB) | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input) |
| `decompression` | Decompression algorithm for compressed CSV inputs | | See [DecompressionStream Compatibility](https://developer.mozilla.org/en-US/docs/Web/API/DecompressionStream#browser_compatibility). Default support: gzip, deflate. deflate-raw is runtime-dependent and experimental (requires `allowExperimentalCompressions: true` for Response/Request inputs). |
| `ignoreBOM` | Whether to ignore Byte Order Mark (BOM) | `false` | See [TextDecoderOptions.ignoreBOM](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream/ignoreBOM) for more information about the BOM. |
| `fatal` | Throw an error on invalid characters | `false` | See [TextDecoderOptions.fatal](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream/fatal) for more information. |
| `allowExperimentalCompressions` | Allow experimental/future compression formats | `false` | When enabled, passes unknown compression formats to runtime. Use cautiously. See example below. |

## Performance & Best Practices ⚑

### Memory Characteristics

web-csv-toolbox uses different memory patterns depending on the API you choose:

#### 🌊 Streaming APIs (Memory Efficient)

##### Recommended for large files (> 10MB)

```js
import { parse } from 'web-csv-toolbox';

// βœ… Memory efficient: processes one record at a time
const response = await fetch('https://example.com/large-data.csv');
for await (const record of parse(response)) {
console.log(record);
// Memory footprint: ~few KB per iteration
}
```

- **Memory usage**: O(1) - constant per record
- **Suitable for**: Files of any size, browser environments
- **Max file size**: Limited only by available storage/network

#### πŸ“¦ Array-Based APIs (Memory Intensive)

##### Recommended for small files (< 1MB)

```js
import { parse } from 'web-csv-toolbox';

// ⚠️ Loads entire result into memory
const csv = await fetch('data.csv').then(r => r.text());
const records = await parse.toArray(csv);
// Memory footprint: entire file + parsed array
```

- **Memory usage**: O(n) - proportional to file size
- **Suitable for**: Small datasets, quick prototyping
- **Recommended max**: ~10MB (browser), ~100MB (Node.js)

### Platform-Specific Considerations

| Platform | Streaming | Array-Based | Notes |
|----------|-----------|-------------|-------|
| **Browser** | Any size | < 10MB | Browser heap limits apply (~100MB-4GB depending on browser) |
| **Node.js** | Any size | < 100MB | Use `--max-old-space-size` flag for larger heaps |
| **Deno** | Any size | < 100MB | Similar to Node.js |

### Performance Tips

#### 1. Use streaming for large files

```js
import { parse } from 'web-csv-toolbox';

const response = await fetch('https://example.com/large-data.csv');

// βœ… Good: Streaming approach (constant memory usage)
for await (const record of parse(response)) {
// Process each record immediately
console.log(record);
// Memory footprint: O(1) - only one record in memory at a time
}

// ❌ Avoid: Loading entire file into memory first
const response2 = await fetch('https://example.com/large-data.csv');
const text = await response2.text(); // Loads entire file into memory
const records = await parse.toArray(text); // Loads all records into memory
for (const record of records) {
console.log(record);
// Memory footprint: O(n) - entire file + all records in memory
}
```

#### 2. Enable AbortSignal for timeout protection

```js
import { parse } from 'web-csv-toolbox';

// Set up a timeout of 30 seconds (30000 milliseconds)
const signal = AbortSignal.timeout(30000);

const response = await fetch('https://example.com/large-data.csv');

try {
for await (const record of parse(response, { signal })) {
// Process each record
console.log(record);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
// Handle timeout
console.log('CSV processing was aborted due to timeout.');
} else {
// Handle other errors
console.error('An error occurred during CSV processing:', error);
}
}
```

#### 3. Use WebAssembly parser for CPU-intensive workloads (Experimental)

```js
import { parseStringToArraySyncWASM } from 'web-csv-toolbox';

// Compiled WASM code for improved performance (UTF-8 only)
// See CodSpeed benchmarks for actual performance metrics
const records = parseStringToArraySyncWASM(csvString);
```

### Known Limitations

- **Delimiter/Quotation**: Must be a single character (multi-character delimiters not supported)
- **WASM Parser**: UTF-8 encoding only, double-quote (`"`) only
- **Streaming**: Best performance with chunk sizes > 1KB

### Security Considerations

For production use with untrusted input, consider:
- Setting timeouts using `AbortSignal.timeout()` to prevent resource exhaustion
- Using `maxBinarySize` option to limit BufferSource inputs (default: 100MB bytes)
- Using `maxBufferSize` option to limit internal buffer size (default: 10M characters)
- Using `maxFieldCount` option to limit fields per record (default: 100,000)
- Implementing additional file size limits at the application level
- Validating parsed data before use

#### Implementing Size Limits for Untrusted Sources

When processing CSV files from untrusted sources (especially compressed files), you can implement size limits using a custom TransformStream:

```js
import { parse } from 'web-csv-toolbox';

// Create a size-limiting TransformStream
class SizeLimitStream extends TransformStream {
constructor(maxBytes) {
let bytesRead = 0;
super({
transform(chunk, controller) {
bytesRead += chunk.length;
if (bytesRead > maxBytes) {
controller.error(new Error(`Size limit exceeded: ${maxBytes} bytes`));
} else {
controller.enqueue(chunk);
}
}
});
}
}

// Example: Limit decompressed data to 10MB
const response = await fetch('https://untrusted-source.com/data.csv.gz');
const limitedStream = response.body
.pipeThrough(new DecompressionStream('gzip'))
.pipeThrough(new SizeLimitStream(10 * 1024 * 1024)); // 10MB limit

try {
for await (const record of parse(limitedStream)) {
console.log(record);
}
} catch (error) {
if (error.message.includes('Size limit exceeded')) {
console.error('File too large - possible compression bomb attack');
}
}
```

**Note**: The library automatically validates Content-Encoding headers when parsing Response objects, rejecting unsupported compression formats.

#### Using Experimental Compression Formats

By default, the library only supports well-tested compression formats: `gzip` and `deflate`. Some runtimes may support additional formats like `deflate-raw` or Brotli, but these are runtime-dependent and not guaranteed. If you need to use these formats, you can enable experimental mode:

```js
import { parse } from 'web-csv-toolbox';

// βœ… Default behavior: Only known formats
const response = await fetch('data.csv.gz');
await parse(response); // Works

// ⚠️ Experimental: Allow future formats
const response2 = await fetch('data.csv.br'); // Brotli compression
try {
await parse(response2, { allowExperimentalCompressions: true });
// Works if runtime supports Brotli
} catch (error) {
// Runtime will throw if format is unsupported
console.error('Runtime does not support this compression format');
}
```

**When to use this:**
- Your runtime supports a newer compression format (e.g., Brotli in modern browsers)
- You want to use the format before this library explicitly supports it
- You trust the compression format source

**Cautions:**
- Error messages will come from the runtime, not this library
- No library-level validation for unknown formats
- You must verify your runtime supports the format

## How to Contribute πŸ’ͺ

## Star ⭐

The easiest way to contribute is to use the library and star the [repository](https://github.com/kamiazya/web-csv-toolbox/).

### Questions πŸ’­

Feel free to ask questions on [GitHub Discussions](https://github.com/kamiazya/web-csv-toolbox/discussions).

### Report bugs / request additional features πŸ’‘

Please create an issue at [GitHub Issues](https://github.com/kamiazya/web-csv-toolbox/issues/new/choose).

### Financial Support πŸ’Έ

Please support [kamiazya](https://github.com/sponsors/kamiazya).

> Even just a dollar is enough motivation to develop 😊

## License βš–οΈ

This software is released under the MIT License, see [LICENSE](https://github.com/kamiazya/web-csv-toolbox/blob/main/LICENSE).

[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox?ref=badge_large)