{"id":22837443,"url":"https://github.com/hubtou/adsv","last_synced_at":"2026-04-17T00:33:14.814Z","repository":{"id":65422722,"uuid":"592044538","full_name":"HubTou/adsv","owner":"HubTou","description":"Analyze delimiter-separated values files","archived":false,"fork":false,"pushed_at":"2024-03-21T18:18:11.000Z","size":48,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T21:48:46.684Z","etag":null,"topics":["command-line-tool","csv","csv-converter","csv-format","csv-parser","csv-parsing","csv-reader","csv-reading","data","data-analysis","data-engineering","data-mining","learning-python","pnu-project","python","servier","shell","tools","unix","utility"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HubTou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-22T18:51:43.000Z","updated_at":"2024-03-12T13:26:02.000Z","dependencies_parsed_at":"2024-03-21T19:46:12.745Z","dependency_job_id":null,"html_url":"https://github.com/HubTou/adsv","commit_stats":{"total_commits":7,"total_committers":1,"mean_commits":7.0,"dds":0.0,"last_synced_commit":"3f0081f902db3bc93e7aabd78855971e6f03b3d8"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fadsv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fadsv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fadsv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fadsv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HubTou","download_url":"https://codeload.github.com/HubTou/adsv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246413119,"owners_count":20773050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","csv","csv-converter","csv-format","csv-parser","csv-parsing","csv-reader","csv-reading","data","data-analysis","data-engineering","data-mining","learning-python","pnu-project","python","servier","shell","tools","unix","utility"],"created_at":"2024-12-12T23:16:53.106Z","updated_at":"2026-04-17T00:33:14.776Z","avatar_url":"https://github.com/HubTou.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Installation\nOnce you have installed [Python](https://www.python.org/downloads/) and its packages manager [pip](https://pip.pypa.io/en/stable/installation/),\ndepending on if you want only this tool, the full set of PNU tools, or PNU plus a selection of additional third-parties tools, use one of these commands:\n\npip install [pnu-adsv](https://pypi.org/project/pnu-adsv/)\n\u003cbr\u003e\npip install [PNU](https://pypi.org/project/PNU/)\n\u003cbr\u003e\npip install [pytnix](https://pypi.org/project/pytnix/)\n\n# ADSV(1)\n\n## NAME\nadsv - Analyze delimiter-separated values files\n\n[![Servier Inspired](https://raw.githubusercontent.com/servierhub/.github/main/badges/inspired.svg)](https://github.com/ServierHub/)\n\n## SYNOPSIS\n**adsv**\n\\[-d|--delimiter CHAR\\]\n\\[-e|--encoding STRING\\]\n\\[-f|--fields LIST\\]\n\\[-F|--flatten\\]\n\\[-h|--hide INT\\]\n\\[-m|--min INT\\]\n\\[-M|--max INT\\]\n\\[-t|--top INT\\]\n\\[--debug\\]\n\\[--help|-?\\]\n\\[--version\\]\n\\[--\\]\nfilename\n\\[...\\]\n\n## DESCRIPTION\nThe **adsv** utility analyzes [delimiter-separated values](https://en.wikipedia.org/wiki/Delimiter-separated_values) files, such as  [Comma-Separated Values .csv](https://en.wikipedia.org/wiki/Comma-separated_values) or [Tab-Separated Values .tsv](https://en.wikipedia.org/wiki/Tab-separated_values) files, and either prints information about their structure and the data in each of their fields, or prints a selection of fields in the order requested.\n\nThe information gathered are:\n* for the file:\n  * the character set encoding\n  * the [CSV dialect](https://specs.frictionlessdata.io/csv-dialect/) (characters used for delimiting, quoting, escaping or lines terminating. Plus the use or not of double quoting)\n  * the presence or not of a headers line\n  * the number of lines and fields\n* for each field:\n  * its number and header\n  * the number of distinct values\n  * the values type (strings, integers, floating numbers, complex numbers, date and time (whatever their format))\n  * the values by descending count\n  * the values range by ascending order using the detected type (useful for numbers and dates)\n\nWhen analyzing a DSV dataset, this allows for a quick and automated way of getting global information about the contents, and explore any oddities...\n\nThere are options:\n* to control and limit what is printed (*-h|--hide*, *-m|--min*, *-M|--max* and *-t|--top*), \n* to avoid (or correct) the detection of the character set encoding and delimiter (*-d|--delimiter*, *-e|--encoding*):\n  * the character set detection can take a long time with big files, so if you know that the file is in \"Windows-1252\" or \"utf-8\" encoding, it's quicker to say it...\n\nIf you use the *-f|--fields* option, you'll skip printing the file analysis, and instead print the selected fields in the order requested, using the detected delimiting, quoting and escaping characters.\n\nIf you encounter multi-lines fields and want to \"flatten\" them to single lines, you can use the *-F|--flatten* option for that.\n\n### OPTIONS\nOptions | Use\n------- | ---\n-d\\|--delimiter CHAR|Specify delimiter to be CHAR\n-e\\|--encoding STRING|Specify charset encoding to be STRING (because detecting encoding can take a long time!)\n-f\\|--fields LIST|Extract LISTed fields values in given order (ex: 6,2-4,1 with fields numbered from 1)\n-F\\|--flatten|Make multi-lines fields single line\n-h\\|--hide INT|Hide the display of distinct values above INT % (default is 20%)\n-m\\|--min INT|Only display distinct values whose count \u003e= INT (default is to display all distinct values)\n-M\\|--max INT|Only display INT lines of distinct values (default is to display all distinct values, within the hide limit)\n-t\\|--top INT|Only display the top/bottom INT lines of values (default is to display the 5 bottom and top lines)\n--debug|Enable debug mode\n--help\\|-?|Print usage and a short help message and exit\n--version|Print version and exit\n--|Options processing terminator\n\n## ENVIRONMENT\nThe ADSV_DEBUG environment variable can also be set to any value to enable debug mode.\n\n## EXIT STATUS\nThe **adsv** utility exits 0 on success, and \u003e0 if an error occurs.\n\n## SEE ALSO\n[cut(1)](https://www.freebsd.org/cgi/man.cgi?query=cut),\n[file(1)](https://www.freebsd.org/cgi/man.cgi?query=file)\n\n## STANDARDS\nThe **adsv** utility is not a standard UNIX command.\n\nThis implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.\n\nThe DSV dialects that can be handled are those compatible with [RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files](https://www.rfc-editor.org/rfc/rfc4180).\n\n## PORTABILITY\nTested OK under Windows.\n\n## HISTORY\nThis implementation was made for the [PNU project](https://github.com/HubTou/PNU).\n\nI do this kind of analysis with each dataset I have to work with.\nLast time I did that, I decided that it was about time to fully automate the process, especially as I was working with fields containing multi-lines values...\n\n## LICENSE\nIt is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).\n\n## AUTHORS\n[Hubert Tournier](https://github.com/HubTou)\n\n## CAVEATS\nUsing \"Sep=X\" as a first line in order to set the X character as a delimiter is not supported.\n\nThere is no support either for potential commented lines inside the data (for example, with */etc/passwd* files under Unix), but it's not part of any recognized DSV dialect anyway.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhubtou%2Fadsv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhubtou%2Fadsv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhubtou%2Fadsv/lists"}