{"id":13730352,"url":"https://github.com/vincentlaucsb/csv-parser","last_synced_at":"2026-02-16T09:12:32.211Z","repository":{"id":40258804,"uuid":"104390893","full_name":"vincentlaucsb/csv-parser","owner":"vincentlaucsb","description":"A high-performance, fully-featured CSV parser and serializer for modern C++.","archived":false,"fork":false,"pushed_at":"2025-01-30T06:36:07.000Z","size":10625,"stargazers_count":964,"open_issues_count":37,"forks_count":165,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-04-13T13:19:26.370Z","etag":null,"topics":["c-plus-plus","c-plus-plus-11","c-plus-plus-14","c-plus-plus-17","csv","csv-parser","csv-reader","json","parser","statistics","tab-separated"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vincentlaucsb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-21T19:38:27.000Z","updated_at":"2025-04-06T14:08:23.000Z","dependencies_parsed_at":"2024-01-31T09:03:23.745Z","dependency_job_id":"2580e131-9268-4c93-b331-616cb56b01e0","html_url":"https://github.com/vincentlaucsb/csv-parser","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentlaucsb%2Fcsv-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentlaucsb%2Fcsv-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentlaucsb%2Fcsv-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vincentlaucsb%2Fcsv-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vincentlaucsb","download_url":"https://codeload.github.com/vincentlaucsb/csv-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717435,"owners_count":21150406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","c-plus-plus-11","c-plus-plus-14","c-plus-plus-17","csv","csv-parser","csv-reader","json","parser","statistics","tab-separated"],"created_at":"2024-08-03T02:01:13.691Z","updated_at":"2026-02-16T09:12:32.204Z","avatar_url":"https://github.com/vincentlaucsb.png","language":"C++","readme":"# Vince's CSV Parser\n[![CMake on Windows](https://github.com/vincentlaucsb/csv-parser/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/vincentlaucsb/csv-parser/actions/workflows/cmake-multi-platform.yml) [![Memory and Thread Sanitizers](https://github.com/vincentlaucsb/csv-parser/actions/workflows/sanitizers.yml/badge.svg)](https://github.com/vincentlaucsb/csv-parser/actions/workflows/sanitizers.yml)\n\n- [Vince's CSV Parser](#vinces-csv-parser)\n  - [Motivation](#motivation)\n    - [Performance and Memory Requirements](#performance-and-memory-requirements)\n      - [Show me the numbers](#show-me-the-numbers)\n    - [Robust Yet Flexible](#robust-yet-flexible)\n      - [RFC 4180 and Beyond](#rfc-4180-and-beyond)\n      - [Encoding](#encoding)\n    - [Well Tested](#well-tested)\n  - [Documentation](#documentation)\n  - [Sponsors](#sponsors)\n  - [Integration](#integration)\n    - [C++ Version](#c-version)\n    - [Single Header](#single-header)\n    - [CMake Instructions](#cmake-instructions)\n      - [Avoid cloning with FetchContent](#avoid-cloning-with-fetchcontent)\n  - [Features \\\u0026 Examples](#features--examples)\n    - [Reading an Arbitrarily Large File (with Iterators)](#reading-an-arbitrarily-large-file-with-iterators)\n      - [Memory-Mapped Files vs. Streams](#memory-mapped-files-vs-streams)\n    - [Indexing by Column Names](#indexing-by-column-names)\n    - [Numeric Conversions](#numeric-conversions)\n    - [Converting to JSON](#converting-to-json)\n    - [Specifying the CSV Format](#specifying-the-csv-format)\n      - [Trimming Whitespace](#trimming-whitespace)\n      - [Handling Variable Numbers of Columns](#handling-variable-numbers-of-columns)\n      - [Setting Column Names](#setting-column-names)\n    - [Parsing an In-Memory String](#parsing-an-in-memory-string)\n    - [Writing CSV Files](#writing-csv-files)\n\n## Motivation\nThere's plenty of other CSV parsers in the wild, but I had a hard time finding what I wanted. Inspired by Python's `csv` module, I wanted a library with **simple, intuitive syntax**. Furthermore, I wanted support for special use cases such as calculating statistics on very large files. Thus, this library was created with these following goals in mind.\n\n### Performance and Memory Requirements\nA high performance CSV parser allows you to take advantage of the deluge of large datasets available. By using overlapped threads, memory mapped IO, and \nminimal memory allocation, this parser can quickly tackle large CSV files--even if they are larger than RAM.\n\nIn fact, [according to Visual Studio's profier](https://github.com/vincentlaucsb/csv-parser/wiki/Microsoft-Visual-Studio-CPU-Profiling-Results) this\nCSV parser **spends almost 90% of its CPU cycles actually reading your data** as opposed to getting hung up in hard disk I/O or pushing around memory.\n\n#### Show me the numbers\nOn my computer (12th Gen Intel(R) Core(TM) i5-12400 @ 2.50 GHz/Western Digital Blue 5400RPM HDD), this parser can read\n * the [69.9 MB 2015_StateDepartment.csv](https://github.com/vincentlaucsb/csv-data/tree/master/real_data) in 0.19 seconds (360 MBps)\n * a [1.4 GB Craigslist Used Vehicles Dataset](https://www.kaggle.com/austinreese/craigslist-carstrucks-data/version/7) in 1.18 seconds (1.2 GBps)\n * a [2.9GB Car Accidents Dataset](https://www.kaggle.com/sobhanmoosavi/us-accidents) in 8.49 seconds (352 MBps)\n\n### Robust Yet Flexible\n#### RFC 4180 and Beyond\nThis CSV parser is much more than a fancy string splitter, and parses all files following [RFC 4180](https://www.rfc-editor.org/rfc/rfc4180.txt).\n\nHowever, in reality we know that RFC 4180 is just a suggestion, and there's many \"flavors\" of CSV such as tab-delimited files. Thus, this library has:\n * Automatic delimiter guessing\n * Ability to ignore comments in leading rows and elsewhere\n * Ability to handle rows of different lengths\n * Ability to handle arbitrary line endings (as long as they are some combination of carriage return and newline)\n\nBy default, rows of variable length are silently ignored, although you may elect to keep them or throw an error.\n\n#### Encoding\nThis CSV parser is encoding-agnostic and will handle ANSI and UTF-8 encoded files.\nIt does not try to decode UTF-8, except for detecting and stripping UTF-8 byte order marks.\n\n### Well Tested\nThis CSV parser has:\n * An extensive Catch2 test suite\n * Address, thread safety, and undefined behavior checks with ASan, TSan, and Valgrind (see [GitHub Actions](https://github.com/vincentlaucsb/csv-parser/actions))\n\nIf you still manage to find a bug, do not hesitate to report it.\n\n## Documentation\n\nIn addition to the [Features \u0026 Examples](#features--examples) below, a [fully-fledged online documentation](https://vincela.com/csv/) contains more examples, details, interesting features, and instructions for less common use cases.\n\n## Sponsors\nIf you use this library for work, please [become a sponsor](https://github.com/sponsors/vincentlaucsb). Your donation\nwill fund continued maintenance and development of the project.\n\n## Integration\n\nThis library was developed with Microsoft Visual Studio and is compatible with \u003eg++ 7.5 and clang.\nAll of the code required to build this library, aside from the C++ standard library, is contained under `include/`.\n\n### C++ Version\nWhile C++17 is recommended, C++11 is the minimum version required. This library makes extensive use of string views, and uses\n[Martin Moene's string view library](https://github.com/martinmoene/string-view-lite) if `std::string_view` is not available.\n\n### Single Header\nThis library is available as a single `.hpp` file under [`single_include/csv.hpp`](single_include/csv.hpp).\n\n### CMake Instructions\nIf you're including this in another CMake project, you can simply clone this repo into your project directory, \nand add the following to your CMakeLists.txt:\n\n```\n# Optional: Defaults to C++ 17\n# set(CSV_CXX_STANDARD 11)\nadd_subdirectory(csv-parser)\n\n# ...\n\nadd_executable(\u003cyour program\u003e ...)\ntarget_link_libraries(\u003cyour program\u003e csv)\n\n```\n\n#### Avoid cloning with FetchContent\nDon't want to clone? No problem. There's also [a simple example documenting how to use CMake's FetchContent module to integrate this library](https://github.com/vincentlaucsb/csv-parser/wiki/Example:-Using-csv%E2%80%90parser-with-CMake-and-FetchContent).\n\n\n## Features \u0026 Examples\n### Reading an Arbitrarily Large File (with Iterators)\nWith this library, you can easily stream over a large file without reading its entirety into memory.\n\n**C++ Style**\n```cpp\n# include \"csv.hpp\"\n\nusing namespace csv;\n\n...\n\nCSVReader reader(\"very_big_file.csv\");\n\nfor (CSVRow\u0026 row: reader) { // Input iterator\n    for (CSVField\u0026 field: row) {\n        // By default, get\u003c\u003e() produces a std::string.\n        // A more efficient get\u003cstring_view\u003e() is also available, where the resulting\n        // string_view is valid as long as the parent CSVRow is alive\n        std::cout \u003c\u003c field.get\u003c\u003e() \u003c\u003c ...\n    }\n}\n\n...\n```\n\n**Old-Fashioned C Style Loop**\n```cpp\n...\n\nCSVReader reader(\"very_big_file.csv\");\nCSVRow row;\n \nwhile (reader.read_row(row)) {\n    // Do stuff with row here\n}\n\n...\n```\n\n#### Memory-Mapped Files vs. Streams\nBy default, passing in a file path string to the constructor of `CSVReader`\ncauses memory-mapped IO to be used. In general, this option is the most\nperformant.\n\nHowever, `std::ifstream` may also be used as well as in-memory sources via `std::stringstream`.\n\n**Note**: Currently CSV guessing only works for memory-mapped files. The CSV dialect\nmust be manually defined for other sources.\n\n**⚠️ IMPORTANT - Iterator Type and Memory Safety**:  \n`CSVReader::iterator` is an **input iterator** (`std::input_iterator_tag`), NOT a forward iterator.\nThis design enables streaming large CSV files (50+ GB) without loading them entirely into memory.\n\n**Why Forward Iterator Algorithms Don't Work**:\n- As the iterator advances, underlying data chunks are automatically freed to bound memory usage\n- Algorithms like `std::max_element` require ForwardIterator semantics (multi-pass, hold multiple positions)\n- Using such algorithms directly on `CSVReader::iterator` will cause **heap-use-after-free** when the\n  algorithm tries to access iterators pointing to already-freed data chunks\n- While it may appear to work with small files that fit in a single chunk, it WILL fail with larger files\n\n**✅ Correct Approach for ForwardIterator Algorithms**:\n```cpp\n// Copy rows to vector first (enables multi-pass iteration)\nCSVReader reader(\"large_file.csv\");\nstd::vector\u003cCSVRow\u003e rows(reader.begin(), reader.end());\n\n// Now safely use any algorithm requiring ForwardIterator\nauto max_row = std::max_element(rows.begin(), rows.end(), \n    [](const CSVRow\u0026 a, const CSVRow\u0026 b) { \n        return a[\"salary\"].get\u003cdouble\u003e() \u003c b[\"salary\"].get\u003cdouble\u003e(); \n    });\n```\n\n\n```cpp\nCSVFormat format;\n// custom formatting options go here\n\nCSVReader mmap(\"some_file.csv\", format);\n\nstd::ifstream infile(\"some_file.csv\", std::ios::binary);\nCSVReader ifstream_reader(infile, format);\n\nstd::stringstream my_csv;\nCSVReader sstream_reader(my_csv, format);\n```\n\n### Indexing by Column Names\nRetrieving values using a column name string is a cheap, constant time operation.\n\n```cpp\n# include \"csv.hpp\"\n\nusing namespace csv;\n\n...\n\nCSVReader reader(\"very_big_file.csv\");\ndouble sum = 0;\n\nfor (auto\u0026 row: reader) {\n    // Note: Can also use index of column with [] operator\n    sum += row[\"Total Salary\"].get\u003cdouble\u003e();\n}\n\n...\n```\n\n### Numeric Conversions\nIf your CSV has lots of numeric values, you can also have this parser (lazily)\nconvert them to the proper data type.\n\n * Type checking is performed on conversions to prevent undefined behavior and integer overflow\n   * Negative numbers cannot be blindly converted to unsigned integer types\n * `get\u003cfloat\u003e()`, `get\u003cdouble\u003e()`, and `get\u003clong double\u003e()` are capable of parsing numbers written in scientific notation.\n * **Note:** Conversions to floating point types are not currently checked for loss of precision.\n\n```cpp\n# include \"csv.hpp\"\n\nusing namespace csv;\n\n...\n\nCSVReader reader(\"very_big_file.csv\");\n\nfor (auto\u0026 row: reader) {\n    if (row[\"timestamp\"].is_int()) {\n        // Can use get\u003c\u003e() with any integer type, but negative\n        // numbers cannot be converted to unsigned types\n        row[\"timestamp\"].get\u003cint\u003e();\n        \n        // You can also attempt to parse hex values\n        long long value;\n        if (row[\"hexValue\"].try_parse_hex(value)) {\n            std::cout \u003c\u003c \"Hex value is \" \u003c\u003c value \u003c\u003c std::endl;\n        }\n\n        // Or specify a different integer type\n        int smallValue;\n        if (row[\"smallHex\"].try_parse_hex\u003cint\u003e(smallValue)) {\n            std::cout \u003c\u003c \"Small hex value is \" \u003c\u003c smallValue \u003c\u003c std::endl;\n        }\n\n        // Non-imperial decimal numbers can be handled this way\n        long double decimalValue;\n        if (row[\"decimalNumber\"].try_parse_decimal(decimalValue, ',')) {\n            std::cout \u003c\u003c \"Decimal value is \" \u003c\u003c decimalValue \u003c\u003c std::endl;\n        }\n\n        // ..\n    }\n}\n\n```\n\n### Converting to JSON\nYou can serialize individual rows as JSON objects, where the keys are column names, or as \nJSON arrays (which don't contain column names). The outputted JSON contains properly escaped\nstrings with minimal whitespace and no quoting for numeric values. How these JSON fragments are \nassembled into a larger JSON document is an exercise left for the user.\n\n```cpp\n# include \u003csstream\u003e\n# include \"csv.hpp\"\n\nusing namespace csv;\n\n...\n\nCSVReader reader(\"very_big_file.csv\");\nstd::stringstream my_json;\n\nfor (auto\u0026 row: reader) {\n    my_json \u003c\u003c row.to_json() \u003c\u003c std::endl;\n    my_json \u003c\u003c row.to_json_array() \u003c\u003c std::endl;\n\n    // You can pass in a vector of column names to\n    // slice or rearrange the outputted JSON\n    my_json \u003c\u003c row.to_json({ \"A\", \"B\", \"C\" }) \u003c\u003c std::endl;\n    my_json \u003c\u003c row.to_json_array({ \"C\", \"B\", \"A\" }) \u003c\u003c std::endl;\n}\n\n```\n\n### Specifying the CSV Format\nAlthough the CSV parser has a decent guessing mechanism, in some cases it is preferrable to specify the exact parameters of a file.\n\n```cpp\n# include \"csv.hpp\"\n# include ...\n\nusing namespace csv;\n\nCSVFormat format;\nformat.delimiter('\\t')\n      .quote('~')\n      .header_row(2);   // Header is on 3rd row (zero-indexed)\n      // .no_header();  // Parse CSVs without a header row\n      // .quote(false); // Turn off quoting \n\n// Alternatively, we can use format.delimiter({ '\\t', ',', ... })\n// to tell the CSV guesser which delimiters to try out\n\nCSVReader reader(\"wierd_csv_dialect.csv\", format);\n\nfor (auto\u0026 row: reader) {\n    // Do stuff with rows here\n}\n\n```\n\n#### Trimming Whitespace\nThis parser can efficiently trim off leading and trailing whitespace. Of course,\nmake sure you don't include your intended delimiter or newlines in the list of characters\nto trim.\n\n```cpp\nCSVFormat format;\nformat.trim({ ' ', '\\t'  });\n```\n\n#### Handling Variable Numbers of Columns\nSometimes, the rows in a CSV are not all of the same length. Whether this was intentional or not,\nthis library is built to handle all use cases.\n\n```cpp\nCSVFormat format;\n\n// Default: Silently ignoring rows with missing or extraneous columns\nformat.variable_columns(false); // Short-hand\nformat.variable_columns(VariableColumnPolicy::IGNORE_ROW);\n\n// Case 2: Keeping variable-length rows\nformat.variable_columns(true); // Short-hand\nformat.variable_columns(VariableColumnPolicy::KEEP);\n\n// Case 3: Throwing an error if variable-length rows are encountered\nformat.variable_columns(VariableColumnPolicy::THROW);\n```\n\n#### Setting Column Names\nIf a CSV file does not have column names, you can specify your own:\n\n```cpp\nstd::vector\u003cstd::string\u003e col_names = { ... };\nCSVFormat format;\nformat.column_names(col_names);\n```\n\n### Parsing an In-Memory String\n\n```cpp\n# include \"csv.hpp\"\n\nusing namespace csv;\n\n...\n\n// Method 1: Using parse()\nstd::string csv_string = \"Actor,Character\\r\\n\"\n    \"Will Ferrell,Ricky Bobby\\r\\n\"\n    \"John C. Reilly,Cal Naughton Jr.\\r\\n\"\n    \"Sacha Baron Cohen,Jean Giard\\r\\n\";\n\nauto rows = parse(csv_string);\nfor (auto\u0026 r: rows) {\n    // Do stuff with row here\n}\n    \n// Method 2: Using _csv operator\nauto rows = \"Actor,Character\\r\\n\"\n    \"Will Ferrell,Ricky Bobby\\r\\n\"\n    \"John C. Reilly,Cal Naughton Jr.\\r\\n\"\n    \"Sacha Baron Cohen,Jean Giard\\r\\n\"_csv;\n\nfor (auto\u0026 r: rows) {\n    // Do stuff with row here\n}\n\n```\n\n### Writing CSV Files\n\n```cpp\n# include \"csv.hpp\"\n# include ...\n\nusing namespace csv;\nusing namespace std;\n\n...\n\nstringstream ss; // Can also use ofstream, etc.\n\nauto writer = make_csv_writer(ss);\n// auto writer = make_tsv_writer(ss);               // For tab-separated files\n// DelimWriter\u003cstringstream, '|', '\"'\u003e writer(ss);  // Your own custom format\n// set_decimal_places(2);                           // How many places after the decimal will be written for floats\n\nwriter \u003c\u003c vector\u003cstring\u003e({ \"A\", \"B\", \"C\" })\n    \u003c\u003c deque\u003cstring\u003e({ \"I'm\", \"too\", \"tired\" })\n    \u003c\u003c list\u003cstring\u003e({ \"to\", \"write\", \"documentation.\" });\n\nwriter \u003c\u003c array\u003cstring, 3\u003e({ \"The quick brown\", \"fox\", \"jumps over the lazy dog\" });\nwriter \u003c\u003c make_tuple(1, 2.0, \"Three\");\n...\n```\n\nYou can pass in arbitrary types into `DelimWriter` by defining a conversion function\nfor that type to `std::string`.\n","funding_links":["https://github.com/sponsors/vincentlaucsb"],"categories":["CSV","C++","Uncategorized","Data Formats"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvincentlaucsb%2Fcsv-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvincentlaucsb%2Fcsv-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvincentlaucsb%2Fcsv-parser/lists"}