{"id":20704105,"url":"https://github.com/solidsnack/tsv","last_synced_at":"2025-04-23T00:47:42.950Z","repository":{"id":13178736,"uuid":"15862033","full_name":"solidsnack/tsv","owner":"solidsnack","description":"A simple, line-oriented tabular data format","archived":false,"fork":false,"pushed_at":"2018-01-20T10:04:01.000Z","size":103,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-23T00:47:35.521Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/solidsnack.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-01-13T08:17:59.000Z","updated_at":"2024-01-03T14:11:03.000Z","dependencies_parsed_at":"2022-09-19T04:31:28.856Z","dependency_job_id":null,"html_url":"https://github.com/solidsnack/tsv","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solidsnack%2Ftsv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solidsnack%2Ftsv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solidsnack%2Ftsv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solidsnack%2Ftsv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/solidsnack","download_url":"https://codeload.github.com/solidsnack/tsv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250348876,"owners_count":21415910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T01:10:59.993Z","updated_at":"2025-04-23T00:47:42.933Z","avatar_url":"https://github.com/solidsnack.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: https://travis-ci.org/solidsnack/tsv.svg?branch=master\n    :target: https://travis-ci.org/solidsnack/tsv\n\n`Linear TSV`__ is a line-oriented, portable tabular data format. Tabular data\n-- rows of tuples, each of the same length -- is commonly stored as CSV and is\nthe lingua franca of spreadsheets, databases and analysis tools.\n\n__ http://dataprotocols.org/linear-tsv/\n\nCSV is almost but not quite line-oriented, because newlines are quoted, not\nescaped. In the TSV format presented here, escape codes are used for newlines\nand tabs in field data, allowing naive filtering with line-oriented shell\ntools like ``sort``, ``fgrep`` and ``cut`` to work as expected. In all of its\ndetails, the format derives from the ``TEXT`` serialization mode of Postgres\nand MySQL.\n\n----------\nPython API\n----------\n\n.. code:: python\n\n    from collections import namedtuple\n    import sys\n\n    import tsv\n\n\n    # Simplest access mode: parse a text stream (strings are okay, too) to a\n    # generator of lists of strings.\n    lists = tsv.un(sys.stdin)\n\n\n    # Parse each row as a particular class derived with namedtuple()\n    class Stats(namedtuple('Stats', ['state', 'city', 'population', 'area'])):\n        pass\n\n    tuples = tsv.un(sys.stdin, Stats)\n\n\n    # Format a collection of rows, getting back a generator of strings, one\n    # each row. Any parseable type is okay.\n    strings = tsv.to(lists)\n    strings = tsv.to(tuples)\n\n    # Write the rows to a handle:\n    strings = tsv.to(tuples, sys.stdout)\n\n\n    # CSV compatible API (reader/writer)\n    with open('eggs.tsv', 'w') as tsvfile:\n        spamwriter = tsv.writer(tsvfile)\n        spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])\n        spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])\n\n    with open('eggs.tsv') as tsvfile:\n        spamreader = tsv.reader(tsvfile)\n        for row in spamreader:\n            print(', '.join(row))\n\n\n    # CSV compatible API (DictReader/DictWriter)\n    with open('names.tsv', 'w') as tsvfile:\n        fieldnames = ['first_name', 'last_name']\n        writer = tsv.DictWriter(tsvfile, fieldnames=fieldnames)\n\n        writer.writeheader()\n        writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})\n        writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})\n        writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})\n\n    with open('names.tsv') as tsvfile:\n        reader = tsv.DictReader(tsvfile)\n        for row in reader:\n            print(row['first_name'], row['last_name'])\n\n------------------\nFormat Description\n------------------\n\nIn this format, all records are separated by ASCII newlines (``0x0a``) and\nfields within a record are separated with ASCII tab (``0x09``). It is permitted\nbut discouraged to separate records with ``\\r\\n``.\n\nTo include newlines, tabs, carriage returns and backslashes in field data, the\nfollowing escape sequences must be used:\n\n* ``\\n`` for newline,\n\n* ``\\t`` for tab,\n\n* ``\\r`` for carriage return,\n\n* ``\\\\`` for backslash.\n\nTo indicate missing data for a field, the character sequence ``\\N`` (bytes\n``0x5c`` and ``0x4e``) is used. Note that the ``N`` is capitalized. This\ncharacter sequence is exactly that used by SQL databases to indicate SQL\n``NULL`` in their tab-separated output mode.\n\n~~~~~~~~~~~~~~~~~~~~~~~~~\nA Word About Header Lines\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThere are no header lines specified by this format. One objection to them is\nthat they break the naive concantenation of files. Another is that they are\nanithetical to stream processing. Yet another is that one generally wants more\nthan column names -- one wants at least column types. Better to do nothing\nthan too little.\n\n----------\nMotivation\n----------\n\nIn advocating a shift to a line-oriented, tab-separated serialization format,\nwe are endorsing an existing format: the default serialization format of both\nPostgres and MySQL. We propose to standardize a subset of the format common to\nboth database systems.\n\nA truly line-oriented format for tabular data, where newline, carriage return\nand the separator are always represented by escape sequences, offers many\npractical advantages, among them:\n\n* The parsers are simple and fast.\n\n* First pass filtering and sorting for line-oriented formats is easy to\n  implement in high-level languages, like Python and Java.\n\n* Analysis and transformation of line-oriented data with command line tools is\n  simple, dependable and often surprisingly efficient.\n\n* By requiring escape sequences when newlines and tabs are in field text, the\n  format allows parsers to naively and efficiently split data on raw byte\n  values: ``0x09`` for fields and ``0x0a`` for records.\n\nCSV is almost right and it's worth talking about the disadvantages of CSV that\nmotivate the author to promote another tabular data format:\n\n* In some locales, ``,`` is the decimal separator; whereas the ASCII tab never\n  collides with the decimal separator. More generally, the tab is not a\n  centuries old glyph that one encounters in natural language.\n\n* CSV is not truly line-oriented -- newlines are quoted, not escaped. A single\n  record can span multiple physical lines. In consequence, line-oriented\n  processing almost works until it doesn't, and then simple tricks -- sorting\n  on the first column to optimize insertion order or batching records in to\n  groups of a few thousand to get better insert performance -- require\n  relatively complicated code to get right.\n\n* CSV's quoting style requires one to mingle field data parsing and record\n  splitting. Taking every third record still requires one to parse the prior\n  two, since a newline inside quotes is not a record separator.\n\n* CSV is ambiguous in many small areas -- the presence or absence of a header\n  line, the choice of quote character (single or double?) and even the choice\n  of separator character are all axes of variability.\n\n----------------------------\nSample Parsers \u0026 Serializers\n----------------------------\n\nA few sample parsers are included in the distribution.\n\nBash\n  ``tsv.bash \u003c cities10.tsv``\n\nPython\n  ``example.py \u003c cities10.tsv``\n\n-------\nGrammar\n-------\n\nThis grammar is presented in the W3C EBNF format.\n\n.. code:: bnf\n\n    TSV        ::= Row (NL Row)*\n\n    /* This form may be read but not written by conforming implementations. */\n    TSVInput   ::= Row (CR? NL Row)*\n\n    Row        ::= Field (Tab Field)*\n    Field      ::= (Escape|NoOpEscape|PlainChar)*\n\n    Char       ::= [http://www.w3.org/TR/xml#NT-Char]\n    PlainChar  ::= Char - (NL|Tab|CR|'\\')\n    NL         ::= #x0A\n    CR         ::= #x0D\n    Tab        ::= #x09\n\n    Escape     ::= '\\n' | '\\r' | '\\t' | '\\\\'\n    NoOpEscape ::= '\\' (Char - ('n'|'r'|'t'|'\\'))\n\nA diagram of the grammar can be generated online with the\n`Bottlecaps Railroad Diagram generator`__.\n\n__ http://bottlecaps.de/rr/ui\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolidsnack%2Ftsv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolidsnack%2Ftsv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolidsnack%2Ftsv/lists"}