{"id":21842431,"url":"https://github.com/erik1066/rapid-csv","last_synced_at":"2026-04-16T01:32:59.405Z","repository":{"id":253027800,"uuid":"840754868","full_name":"erik1066/rapid-csv","owner":"erik1066","description":"A .NET library for fast and efficient parsing, validation, and transformation of CSV files.","archived":false,"fork":false,"pushed_at":"2025-12-21T20:10:35.000Z","size":484,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-23T08:45:13.540Z","etag":null,"topics":["csv","csv-validation","csv-validator"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erik1066.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-10T15:29:30.000Z","updated_at":"2025-12-21T20:10:39.000Z","dependencies_parsed_at":"2024-08-27T03:53:13.560Z","dependency_job_id":"946f157a-e5e8-4ecb-b040-235c775e3207","html_url":"https://github.com/erik1066/rapid-csv","commit_stats":null,"previous_names":["erik1066/fast-csv","erik1066/rapid-csv"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/erik1066/rapid-csv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erik1066%2Frapid-csv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erik1066%2Frapid-csv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erik1066%2Frapid-csv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erik1066%2Frapid-csv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erik1066","download_url":"https://codeload.github.com/erik1066/rapid-csv/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erik1066%2Frapid-csv/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31867710,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-validation","csv-validator"],"created_at":"2024-11-27T22:11:57.999Z","updated_at":"2026-04-16T01:32:59.390Z","avatar_url":"https://github.com/erik1066.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fast CSV Validator and Transformer\n\n[![NuGet version (RapidCsv)](https://img.shields.io/nuget/v/RapidCsv?style=flat-square)](https://www.nuget.org/packages/RapidCsv/)\n\nA .NET library for fast and efficient validation and transformation of CSV files. \n\nStructural CSV validation rules adhere to [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180). \n\nAdditional content validation rules can be configured by supplying an *optional* JSON [validation profile](validator-config-schema.json). A validation profile allows specifying column names, data types, column rules (e.g. if data for that column are required, what the min/max length should be, and so on). \n\n## Performance\n\nRFC 4180 validation on a 40 column, 100,000 row CSV file takes 235 ms and allocates a total of 100 MB of memory on an old Intel laptop CPU from the 2010s. See [benchmark results](./Benchmarks.md) for more.\n\nYou can run benchmarks using a special benchmarking project by navigating to `tests/RapidCsv.Benchmarks` and running:\n\n```bash\ndotnet run -c Release`\n```\n\n## Basic Usage - Validate a CSV file against [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180)\n\n1. Add a reference to RapidCsv in your `.csproj` file:\n\n```xml\n\u003cProject Sdk=\"Microsoft.NET.Sdk\"\u003e\n\n  \u003cItemGroup\u003e\n    \u003cPackageReference Include=\"RapidCsv\" Version=\"0.0.2\" /\u003e\n  \u003c/ItemGroup\u003e\n\n  \u003cPropertyGroup\u003e\n    \u003cOutputType\u003eExe\u003c/OutputType\u003e\n    \u003cTargetFramework\u003enet10.0\u003c/TargetFramework\u003e\n    \u003cImplicitUsings\u003eenable\u003c/ImplicitUsings\u003e\n    \u003cNullable\u003eenable\u003c/Nullable\u003e\n  \u003c/PropertyGroup\u003e\n\n\u003c/Project\u003e\n```\n\n2. Add a `using RapidCsv;` directive at the top of your class file.\n\n3. Create a `CsvValidator` object and call its `Validate` method, passing both a stream and a `ValidationOptions` object into that method.\n\n```cs\nusing RapidCsv;\n\nstring csvContent = @\"NAME,AGE,DOB\nJohn,23,1/1/2012\nMary,34,1/1/1990\nJane,25,1/1/2010\nHana,55,1/1/1970\";\n\nCsvValidator validator = new CsvValidator();\nvar options = new ValidationOptions()\n{\n    Separator = ',',\n    HasHeaderRow = true\n};\n\nStream content = GenerateStreamFromString(csvContent);\nValidationResult result = validator.Validate(content: content, options: options);\n\nConsole.WriteLine($\"Valid File = {result.IsValid}\");\n\nstatic Stream GenerateStreamFromString(string s)\n{\n    var stream = new MemoryStream();\n    var writer = new StreamWriter(stream);\n    writer.Write(s);\n    writer.Flush();\n    stream.Position = 0;\n    return stream;\n}\n```\n\n## Examples\n\nThe [examples](/examples/) folder contains example code that demonstrates how to use RapidCsv.\n\n### Example #1: RFC 4180 validation in a .NET Console App\n\nLet's look at the `RapidCsv.ConsoleDemo` project. This app shows how you how to validate a CSV file against just the RFC 4180 specification.\n\n1. Navigate to [examples/demo-console/](examples/demo-console/) in a terminal of your choice. \n1. Enter the following into the terminal:\n\n```bash\ndotnet run\n```\n\n3. Observe for the following output:\n\n```\nValid File = True\n Data Rows         = 4\n Elapsed time (ms) = 3ms\n Columns           = 3\n Error count       = 0\n Warning count     = 0\n Headers = \n  Column 1 = NAME\n  Column 2 = AGE\n  Column 3 = DOB\n```\n\nThat's all there is to it.\n\n\u003e This console app includes a hard-coded CSV file in `program.cs` to make it as simple as possible to run the example. A CSV input file is therefore not required.\n\n### Example #2: Profile-driven content validation\n\nSee [demo-console-content-validation](./examples/demo-console-content-validation/) for working and runnable code that uses the example below.\n\nLet's say you want more than RFC 4180 validation. Perhaps you have a CSV file like the one below:\n\n\n```\nNAME,AGE,DOB,PHONE,STATUS\nJohn,23,1/1/2012,555-555-5555,actv\nMary,34,1/1/1990,555-555-5555,inac\nJane,25,1/1/2010,555-555-5555,actv\nHana,55,1/1/1970,555-555-555X,unkn\n```\n\nAnd let's suppose we want to validate this CSV file on the following rules:\n\n1. `NAME` must be 0-25 characters\n1. `AGE` must be an integer\n1. `DOB` must use `m/d/yyyy` format\n1. `PHONE` must be a valid 10-digit US phone number\n1. `STATUS` must be one of two values, `actv` or `inac`; all other values are invalid\n\nWe can create an optional validation profile in JSON that implements these rules:\n\n```json\n{\n    \"$schema\": \"rapid-csv/validator-config-schema.json\",\n    \"name\": \"Acme Bookstore Customer Records\",\n    \"description\": \"Validation profile for the CSV records of our Acme bookstore customers\",\n    \"filename\": \"abc123.csv\",\n    \"separator\": \",\",\n    \"has_header\": true,\n    \"columns\": [\n        {\n            \"name\": \"NAME\",\n            \"description\": \"The customer's name\",\n            \"ordinal\": 1,\n            \"type\": \"string\",\n            \"max\": 25,\n            \"min\": 0,\n            \"required\": false,\n            \"null_or_empty\": true,\n            \"format\": null,\n            \"regex\": null\n        },\n        {\n            \"name\": \"AGE\",\n            \"description\": \"The customer's age\",\n            \"ordinal\": 2,\n            \"type\": \"integer\",\n            \"max\": 125,\n            \"min\": 7,\n            \"required\": false,\n            \"null_or_empty\": true,\n            \"format\": null,\n            \"regex\": null\n        },\n        {\n            \"name\": \"DOB\",\n            \"description\": \"The customer's date of birth\",\n            \"ordinal\": 3,\n            \"type\": \"string\",\n            \"required\": false,\n            \"null_or_empty\": true,\n            \"format\": \"m/d/yyyy\",\n            \"regex\": null\n        },\n        {\n            \"name\": \"PHONE\",\n            \"description\": \"The customer's phone number\",\n            \"ordinal\": 4,\n            \"type\": \"string\",\n            \"required\": false,\n            \"null_or_empty\": true,\n            \"format\": null,\n            \"regex\": \"^(\\\\+\\\\d{1,2}\\\\s)?\\\\(?\\\\d{3}\\\\)?[\\\\s.-]\\\\d{3}[\\\\s.-]\\\\d{4}$\"\n        },\n        {\n            \"name\": \"STATUS\",\n            \"description\": \"Customer status\",\n            \"ordinal\": 5,\n            \"type\": \"enum\",\n            \"values\": [ \"actv\", \"inac\" ],\n            \"required\": false,\n            \"null_or_empty\": true,\n            \"format\": null,\n            \"regex\": null\n        }\n    ]\n};\n```\nNote the use of the `format` property in the `DOB` column definition, the `regex` for the `PHONE` column, the use of `min` and `max` for `NAME`, and the `enum` with `values` in the `STATUS` column. These are how we define the five rules outlined earlier.\n\nUsing the profile is straightforward:\n\n```csharp\n// Create the validator object\nCsvValidator validator = new CsvValidator();\n\n// Create the validation options\nvar options = new ValidationOptions()\n{\n    Separator = ',',\n    HasHeaderRow = true,\n    ValidationProfile = validationProfile\n};\n\nStream content = GenerateStreamFromString(csvContent);\n\n// Validate the file using the validator, and return the result to the caller\nValidationResult result = validator.Validate(content: content, options: options);\n```\n\nIn other words, we read the raw JSON into memory and assign it to the `ValidationProfile` property of the `ValidationOptions` object. The validator will then use the profile to execute these content checks. \n\n\u003e Since `ValidationProfile` is optional and can be empty, leaving it empty will conduct basic RFC 4180 checks only and apply no content validation rules.\n\nThe added overhead of these profile-driven content checks can be significant in terms of performance when running the validator at scale. Use caution in applying these rules and only apply them when real-time content validation is required for the use case.\n\n\n## Architecture and Design Decisions\n\nRapidCsv is meant to be used in situations where one needs speed and memory efficiency _at scale_. For instance, if you're required to process CSV files in near real-time at high volume, where validation results are viewable by clients almost instantly after file submission, then this is a library worth considering. \n\nThis is also why the library was built and shapes the design decisions around why the code is written the way it is.\n\n### High performance and memory efficiency\n\nThe use of `ReadOnlySpan\u003cT\u003e` in the library is intentional. A simpler way of dealing with CSV files might be to use `string.Split(',')` but this presents issues, namely that splitting strings copies the string's contents into new memory (the array of string fragments that the `Split()` method generates). This increases memory use, the extra allocations result in slightly slower code, and it increases the amount of garbage collection that must occur to clean up all that duplicated memory.\n\nBy using `ReadOnlySpan\u003cT\u003e`, a lower-level API in .NET, we can get a view into a subset of the string instead of creating copies. Spans are harder to work with from a practical standpoint and make the code harder to read and maintain. \n\nA state machine-like algorithm is needed to parse each line in a CSV file. The algorithm goes character-by-character over the `ReadOnlySpan\u003cchar\u003e` and must keep track of things like whether it's in a quoted field or not in order to know how to interpret the current character. Meanwhile, it must validate what it finds.\n\n### No limits on file size\n\nRapidCsv operates on streams. The whole CSV file does not need to be read at once, unlike some other competing libraries, and the fast performance means even larger files (e.g. 100k rows) can be validated in under 1 second.\n\n### Human-readable error messages\n\nReadable and understandable error messages are critical. Detected errors will give human-understandable outputs that even users with low technical skills should be able to understand, within reason.\n\n### Ease of use by developers\n\nThe library is meant to be simple and easy to use by developers. It's one function call in one class:\n\n```cs\nCsvValidator validator = new CsvValidator();\n\nvar options = new ValidationOptions()\n{\n    Separator = ',',\n    HasHeaderRow = true\n};\n\nValidationResult result = validator.Validate(content: content, options: options);\n```\n\nIn the code snippet above, we create a `validator` class, pass it some very basic `options`, and then call the validator's `validate` method. Without more advanced options this will validate the file against RFC 4180 specifications.\n\nThe `content` in this case is of type `Stream`. You can then do useful things with the `result` type you get back, such as iterate over all the errors/warnings or read a boolean flag to see if the file is valid or invalid.\n\nThere are more advanced things you can do with the `Validate` method such as specify a JSON content validation configuration, which will go beyond RFC 4180 and do things like check field content against your supplied regular expressions, data type specifications, min/max values, and other rules, but it is not required to supply such a configuration.\n\n### Few to no dependencies\n\nThe software supply chain is hard to secure today. RapidCsv currently uses no dependencies. \n\n### Configurable content validation rules\n\nDo you need to go beyond RFC 4180 rules for your real-time CSV validation needs? The [validation rules](./validator-config-schema.json) allow you to specify some basic content validation checks, such as min/max length, regular expression checks, formatting checks, and data types. These show up as error type _Content_ to distinguish them from RFC 4180 errors, which show up as error type _Structural_. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferik1066%2Frapid-csv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferik1066%2Frapid-csv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferik1066%2Frapid-csv/lists"}