https://github.com/fluxnull/accsv
A high-performance lexer/parser for the ACCSV data format.
https://github.com/fluxnull/accsv
Last synced: 5 months ago
JSON representation
A high-performance lexer/parser for the ACCSV data format.
- Host: GitHub
- URL: https://github.com/fluxnull/accsv
- Owner: fluxnull
- License: mit
- Created: 2025-08-16T03:41:27.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-16T09:10:36.000Z (11 months ago)
- Last Synced: 2025-08-16T09:27:01.415Z (11 months ago)
- Size: 38.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ACCSV: ASCII Control Character Separated Values
[](https://opensource.org/licenses/MIT)
[](https://github.com/yourusername/accsv/releases/tag/v4.0.0)
[](https://github.com/yourusername/accsv/actions)
A rediscovery of ASCII control characters for robust, collision-proof tabular data storage. Fixes CSV's delimiter hell with `US` (0x1F) for fields and `RS` (0x1E) for records. Simple, fast, no escaping needed.
## Manifesto: Looking Back to Move Forward
For 50+ years, CSV/TSV have sucked due to delimiter collisions leading to quoting/escaping nightmares. ASCII's control chars like US/RS/GS were built for this—non-printable, zero collision risk. We forgot 'em; ACCSV brings 'em back for speed and simplicity.
This ain't new invention; it's overlooked genius from 1963. Challenge: What other simple fixes have we ignored?
## File Specifications
### ACCSV Data File (`.accsv`)
Pure record stream.
| Property | Specification |
|----------|--------------|
| **Field Separator** | `US` (0x1F) |
| **Record Terminator** | `RS` (0x1E) |
| **Header Flag** | Optional `SUB` (0x1A) at byte 0 indicates first record is header. |
| **Cosmetic Newline** | Optional LF/CRLF after RS (ignored by parsers). |
| **Encoding** | Agnostic; app handles byte interpretation. |
### Manifest/Index File (`.accsv.midx`)
Hybrid: Readable metadata + binary index.
- **[Meta]** section: Key-value pairs (e.g., `Path = file.accsv`, `Algorithm = SHA256`, `Digest = hash`).
- **[IDX]** section: Binary follows.
Binary structure (little-endian):
| Part | Size | Type | Description |
|------|------|------|-------------|
| Magic | 8 | char[8] | `ACSVIDX1` |
| Version | 2 | uint16_t | e.g., 0x0100 |
| Reserved | 6 | uint8_t[6] | Zeros |
| Record Count | 8 | uint64_t | Total records |
| Offsets | N*8 | uint64_t[] | Byte offsets per record |
## Command-Line Tool: `accsv`
High-perf CLI for parsing/indexing. Subcommands like git.
Usage: `accsv [options]`
Commands:
- `index `: Create `.accsv.midx`.
- `count [file.accsv]`: Record count (fast with index).
- `view [file.accsv]`: Human-readable output (tabs/newlines).
- `slice [end]`: Extract range (-ah adds header).
Full help: `accsv --help`
## Core C Library API (`accsv.h`)
Public interface:
```c
#ifndef ACCSV_H
#define ACCSV_H
#include
#include
#include
typedef struct { const char* start; size_t length; } AccsvFieldView;
typedef struct { AccsvFieldView* fields; size_t field_count; } AccsvRecordView;
typedef struct AccsvParser AccsvParser;
typedef struct AccsvIndex AccsvIndex;
AccsvParser* accsv_parser_new(FILE* stream);
const AccsvRecordView* accsv_parser_next_record(AccsvParser* parser);
void accsv_parser_free(AccsvParser* parser);
int accsv_parser_has_header(const AccsvParser* parser);
AccsvIndex* accsv_index_load(const char* midx_path);
void accsv_index_free(AccsvIndex* index);
uint64_t accsv_index_get_record_count(const AccsvIndex* index);
int accsv_parser_seek(AccsvParser* parser, const AccsvIndex* index, uint64_t record_number);
int accsv_parser_in_error_state(const AccsvParser* parser);
const char* accsv_parser_get_error(const AccsvParser* parser);
#endif