{"id":14483407,"url":"https://github.com/medialab/xan","last_synced_at":"2026-04-09T17:16:57.691Z","repository":{"id":65436208,"uuid":"140488417","full_name":"medialab/xan","owner":"medialab","description":"The CSV magician","archived":false,"fork":false,"pushed_at":"2025-05-09T17:16:17.000Z","size":8167,"stargazers_count":2565,"open_issues_count":63,"forks_count":44,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-05-13T00:15:36.889Z","etag":null,"topics":["cli","csv","rust","tsv"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"BurntSushi/xsv","license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/medialab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2018-07-10T21:21:08.000Z","updated_at":"2025-05-12T23:28:28.000Z","dependencies_parsed_at":"2024-11-21T10:20:53.245Z","dependency_job_id":"a6b22f88-2faa-48c4-a3c7-c9b8222b46d4","html_url":"https://github.com/medialab/xan","commit_stats":null,"previous_names":["medialab/xan"],"tags_count":123,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medialab%2Fxan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medialab%2Fxan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medialab%2Fxan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/medialab%2Fxan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/medialab","download_url":"https://codeload.github.com/medialab/xan/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253843225,"owners_count":21972874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","csv","rust","tsv"],"created_at":"2024-09-03T00:01:44.107Z","updated_at":"2026-04-09T17:16:57.678Z","avatar_url":"https://github.com/medialab.png","language":"Rust","funding_links":[],"categories":["Rust","Rust程序设计","💻 Apps"],"sub_categories":["资源传输下载","🚀 Productivity and Utilities"],"readme":"[![Build Status](https://github.com/medialab/xan/workflows/Tests/badge.svg)](https://github.com/medialab/xan/actions) [![DOI](https://zenodo.org/badge/140488417.svg)](https://doi.org/10.5281/zenodo.15310200)\n\n# `xan`, the CSV magician\n\n`xan` is a command line tool that can be used to process CSV files directly from the shell.\n\nIt has been written in Rust to be as fast as possible, use as little memory as possible, and can very easily handle large CSV files (Gigabytes). It leverages a novel [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) CSV [parser](https://docs.rs/simd-csv) and is also able to parallelize some computations (through multithreading) to make some tasks complete as fast as your computer can allow.\n\nIt can easily preview, filter, slice, aggregate, sort, join CSV files, and exposes a large collection of composable commands that can be chained together to perform a wide variety of typical tasks.\n\n`xan` also offers its own expression language so you can perform complex tasks that cannot be done by relying on the simplest commands. This minimalistic language has been tailored for CSV data and is *way* faster than evaluating typical dynamically-typed languages such as Python, Lua, JavaScript etc.\n\nNote that this tool is originally a fork of [BurntSushi](https://github.com/BurntSushi)'s [`xsv`](https://github.com/BurntSushi/xsv), but has been nearly entirely rewritten at that point, to fit [SciencesPo's médialab](https://github.com/medialab) use-cases, rooted in web data collection and analysis geared towards social sciences (you might think CSV is outdated by now, but read our [love letter](./docs/LOVE_LETTER.md) to the format before judging too quickly).\n\n`xan` therefore goes beyond typical data manipulation and expose utilities related to lexicometry, graph theory and even scraping.\n\nBeyond CSV data, `xan` is able to process a large variety of CSV-adjacent data formats from many different disciplines such as web archival (`.cdx`) or bioinformatics (`.vcf`, `.gtf`, `.sam`, `.bed` etc.). `xan` is also able to convert to \u0026 from many data formats such as json, excel files, numpy arrays etc. using [`xan to`](./docs/cmd/to.md) and [`xan from`](./docs/cmd/from.md). See [this](#supported-file-formats) section for more detail.\n\nFinally, `xan` can be used to display CSV files in the terminal, for easy exploration, and can even be used to draw basic data visualisations:\n\n|*view command*|*flatten command*|\n|:---:|:---:|\n|![view](./docs/img/grid/view.png)|![flatten](./docs/img/grid/flatten.png)|\n|__*categorical histogram*__|__*scatterplot*__|\n|![categ-hist](./docs/img/grid/categ-hist.png)|![correlation](./docs/img/grid/correlation.png)|\n|__*categorical scatterplot*__|__*histograms*__|\n|![scatter](./docs/img/grid/scatter.png)|![hist](./docs/img/grid/hist.png)|\n|__*parallel processing*__|__*time series*__|\n|![parallel](./docs/img/grid/parallel.png)|![series](./docs/img/grid/series.png)|\n|__*small multiples (facet grid)*__|__*grouped view*__|\n|![small-multiples](./docs/img/grid/small-multiples.png)|![view-grid](./docs/img/grid/view-grid.png)|\n|__*correlation matrix heatmap*__|__*heatmap*__|\n|![small-multiples](./docs/img/grid/corr-heatmap.png)|![view-grid](./docs/img/grid/heatmap.png)|\n\n## Summary\n\n* [How to install](#how-to-install)\n  * [Cargo](#cargo)\n  * [Scoop (Windows)](#scoop-windows)\n  * [Homebrew (macOS)](#homebrew-macos)\n  * [Arch Linux](#arch-linux)\n  * [NetBSD](#netbsd)\n  * [Nix](#nix)\n  * [Pixi](#pixi-linux-macos-windows)\n  * [Conda Forge](#conda-forge)\n  * [Pre-built binaries](#pre-built-binaries)\n  * [Installing completions](#installing-completions)\n* [Quick tour](#quick-tour)\n* [Learning](#learning)\n* [Available commands](#available-commands)\n* [General flags and IO model](#general-flags-and-io-model)\n  * [Getting help](#getting-help)\n  * [Regarding input \u0026 output formats](#regarding-input--output-formats)\n  * [Working with headless CSV file](#working-with-headless-csv-file)\n  * [Regarding stdin](#regarding-stdin)\n  * [Regarding stdout](#regarding-stdout)\n  * [Supported file formats](#supported-file-formats)\n  * [Compressed files](#compressed-files)\n  * [Regarding color](#regarding-color)\n* [Expression language reference](#expression-language-reference)\n* [News](#news)\n* [How to cite?](#how-to-cite)\n* [Frequently Asked Questions](#frequently-asked-questions)\n\n## How to install\n\n### Cargo\n\n`xan` can be installed using cargo (it usually comes with [Rust](https://www.rust-lang.org/tools/install)):\n\n```bash\ncargo install xan --locked\n```\n\n*Optional features*\n\nSome features are not built by default because they cost too much in compilation time or in executable size. Here is a list of those optional features:\n\n* `parquet`: enables `xan from -f parquet`\n\nYou can specify which optional features you want thusly:\n\n```bash\n# To enable all optional features\ncargo install xan --locked --all-features\n# To enable only specific features\ncargo install xan --locked --features parquet\n```\n\nYou can also tweak the build flags to make sure the Rust compiler is able to leverage all your CPU's features:\n\n```bash\nCARGO_BUILD_RUSTFLAGS='-C target-cpu=native' cargo install xan --locked\n```\n\nYou can also install the latest dev version thusly:\n\n```bash\ncargo install --git https://github.com/medialab/xan --locked\n```\n\n### Scoop (Windows)\n\n`xan` can be installed using [Scoop](https://scoop.sh/) on Windows:\n\n```powershell\nscoop bucket add extras\nscoop install xan\n```\n\n### Homebrew (macOS)\n\n`xan` can be installed with [Homebrew](https://brew.sh/) on macOS thusly:\n\n```bash\nbrew install xan\n```\n\n### Arch Linux\n\nYou can install `xan` from the [extra repository](https://archlinux.org/packages/extra/x86_64/xan/) using `pacman`:\n\n```bash\nsudo pacman -S xan\n```\n\n### NetBSD\n\nA package is available from the official repositories. To install `xan` simply run:\n\n```\npkgin install xan\n```\n\n### Nix\n\n`xan` is packaged for Nix, and is available in Nixpkgs as of 25.05 release. To\ninstall it, you may add it to your `environment.systemPackages` as `pkgs.xan` or\nuse `nix-shell` to enter an ephemeral shell.\n\n```bash\nnix-shell -p xan\n```\n\n### Pixi (Linux, macOS, Windows)\n\n`xan` can be installed in Linux, macOS, and Windows using the [Pixi](https://pixi.sh/latest/) package manager:\n\n```bash\npixi global install xan\n```\n\n### Conda Forge\n\n`xan` can be installed through [conda-forge](https://conda-forge.org/) thusly:\n\n```bash\nconda install conda-forge::xan\n```\n\n### Pre-built binaries\n\nPre-built binaries can be found attached to every GitHub [releases](https://github.com/medialab/xan/releases/latest).\n\nCurrently supported targets include:\n\n- `x86_64-apple-darwin`\n- `x86_64-unknown-linux-gnu`\n- `x86_64-unknown-linux-musl`\n- `x86_64-pc-windows-msvc`\n\n- `aarch64-apple-darwin`\n- `aarch64-unknown-linux-gnu`\n\n`ppc64le` targets are not built by the CI yet but prebuilt binaries can still be found in the `conda-forge` package's [files](https://anaconda.org/conda-forge/xan/files) if you need them.\n\nFeel free to open a PR to improve the CI by adding relevant targets.\n\n### Installing completions\n\nNote that `xan` also exposes handy automatic completions for command and header/column names that you can install through the `xan completions` command.\n\nRun the following command to understand how to install those completions:\n\n```bash\nxan completions -h\n\n# With zsh you might also need to add this to your initialization to make\n# sure Bash compatibility is loaded:\nautoload -Uz bashcompinit \u0026\u0026 bashcompinit\n```\n\n## Quick tour\n\nLet's learn about the most commonly used `xan` commands by exploring a corpus of French medias:\n\n### Downloading the corpus\n\n```bash\ncurl -LO https://github.com/medialab/corpora/raw/master/polarisation/medias.csv\n```\n\n### Displaying the file's headers\n\n```bash\nxan headers medias.csv\n```\n\n```\n0   webentity_id\n1   name\n2   prefixes\n3   home_page\n4   start_pages\n5   indegree\n6   hyphe_creation_timestamp\n7   hyphe_last_modification_timestamp\n8   outreach\n9   foundation_year\n10  batch\n11  edito\n12  parody\n13  origin\n14  digital_native\n15  mediacloud_ids\n16  wheel_category\n17  wheel_subcategory\n18  has_paywall\n19  inactive\n```\n\n### Counting the number of rows\n\n```bash\nxan count medias.csv\n```\n\n```\n478\n```\n\n### Previewing the file in the terminal\n\n```bash\nxan view medias.csv\n```\n\n```\nDisplaying 5/20 cols from 10 first rows of medias.csv\n┌───┬───────────────┬───────────────┬────────────┬───┬─────────────┬──────────┐\n│ - │ name          │ prefixes      │ home_page  │ … │ has_paywall │ inactive │\n├───┼───────────────┼───────────────┼────────────┼───┼─────────────┼──────────┤\n│ 0 │ Acrimed.org   │ http://acrim… │ http://ww… │ … │ false       │ \u003cempty\u003e  │\n│ 1 │ 24matins.fr   │ http://24mat… │ https://w… │ … │ false       │ \u003cempty\u003e  │\n│ 2 │ Actumag.info  │ http://actum… │ https://a… │ … │ false       │ \u003cempty\u003e  │\n│ 3 │ 2012un-Nouve… │ http://2012u… │ http://ww… │ … │ false       │ \u003cempty\u003e  │\n│ 4 │ 24heuresactu… │ http://24heu… │ http://24… │ … │ false       │ \u003cempty\u003e  │\n│ 5 │ AgoraVox      │ http://agora… │ http://ww… │ … │ false       │ \u003cempty\u003e  │\n│ 6 │ Al-Kanz.org   │ http://al-ka… │ https://w… │ … │ false       │ \u003cempty\u003e  │\n│ 7 │ Alalumieredu… │ http://alalu… │ http://al… │ … │ false       │ \u003cempty\u003e  │\n│ 8 │ Allodocteurs… │ http://allod… │ https://w… │ … │ false       │ \u003cempty\u003e  │\n│ 9 │ Alterinfo.net │ http://alter… │ http://ww… │ … │ \u003cempty\u003e     │ true     │\n│ … │ …             │ …             │ …          │ … │ …           │ …        │\n└───┴───────────────┴───────────────┴────────────┴───┴─────────────┴──────────┘\n```\n\nOn unix, don't hesitate to use the `-p` flag to automagically forward the full output to an appropriate pager and skim through all the columns.\n\n### Reading a flattened representation of the first row\n\n```bash\n# NOTE: drop -c to avoid truncating the values\nxan flatten -c medias.csv\n```\n\n```\nRow n°0\n───────────────────────────────────────────────────────────────────────────────\nwebentity_id                      1\nname                              Acrimed.org\nprefixes                          http://acrimed.org|http://acrimed69.blogspot…\nhome_page                         http://www.acrimed.org\nstart_pages                       http://acrimed.org|http://acrimed69.blogspot…\nindegree                          61\nhyphe_creation_timestamp          1560347020330\nhyphe_last_modification_timestamp 1560526005389\noutreach                          nationale\nfoundation_year                   2002\nbatch                             1\nedito                             media\nparody                            false\norigin                            france\ndigital_native                    true\nmediacloud_ids                    258269\nwheel_category                    Opinion Journalism\nwheel_subcategory                 Left Wing\nhas_paywall                       false\ninactive                          \u003cempty\u003e\n\nRow n°1\n───────────────────────────────────────────────────────────────────────────────\nwebentity_id                      2\n...\n```\n\n### Searching for rows\n\n```bash\nxan search -s outreach internationale medias.csv | xan view\n```\n\n```\nDisplaying 4/20 cols from 10 first rows of \u003cstdin\u003e\n┌───┬──────────────┬────────────────────┬───┬─────────────┬──────────┐\n│ - │ webentity_id │ name               │ … │ has_paywall │ inactive │\n├───┼──────────────┼────────────────────┼───┼─────────────┼──────────┤\n│ 0 │ 25           │ Businessinsider.fr │ … │ false       │ \u003cempty\u003e  │\n│ 1 │ 59           │ Europe-Israel.org  │ … │ false       │ \u003cempty\u003e  │\n│ 2 │ 66           │ France 24          │ … │ false       │ \u003cempty\u003e  │\n│ 3 │ 220          │ RFI                │ … │ false       │ \u003cempty\u003e  │\n│ 4 │ 231          │ fr.Sott.net        │ … │ false       │ \u003cempty\u003e  │\n│ 5 │ 246          │ Voltairenet.org    │ … │ true        │ \u003cempty\u003e  │\n│ 6 │ 254          │ Afp.com /fr        │ … │ false       │ \u003cempty\u003e  │\n│ 7 │ 265          │ Euronews FR        │ … │ false       │ \u003cempty\u003e  │\n│ 8 │ 333          │ Arte.tv            │ … │ false       │ \u003cempty\u003e  │\n│ 9 │ 341          │ I24News.tv         │ … │ false       │ \u003cempty\u003e  │\n│ … │ …            │ …                  │ … │ …           │ …        │\n└───┴──────────────┴────────────────────┴───┴─────────────┴──────────┘\n```\n\n### Selecting some columns\n\n```bash\nxan select foundation_year,name medias.csv | xan view\n```\n\n```\nDisplaying 2 cols from 10 first rows of \u003cstdin\u003e\n┌───┬─────────────────┬───────────────────────────────────────┐\n│ - │ foundation_year │ name                                  │\n├───┼─────────────────┼───────────────────────────────────────┤\n│ 0 │ 2002            │ Acrimed.org                           │\n│ 1 │ 2006            │ 24matins.fr                           │\n│ 2 │ 2013            │ Actumag.info                          │\n│ 3 │ 2012            │ 2012un-Nouveau-Paradigme.com          │\n│ 4 │ 2010            │ 24heuresactu.com                      │\n│ 5 │ 2005            │ AgoraVox                              │\n│ 6 │ 2008            │ Al-Kanz.org                           │\n│ 7 │ 2012            │ Alalumieredunouveaumonde.blogspot.com │\n│ 8 │ 2005            │ Allodocteurs.fr                       │\n│ 9 │ 2005            │ Alterinfo.net                         │\n│ … │ …               │ …                                     │\n└───┴─────────────────┴───────────────────────────────────────┘\n```\n\n### Sorting the file\n\n```bash\nxan sort -s foundation_year medias.csv | xan view -s name,foundation_year\n```\n\n```\nDisplaying 2 cols from 10 first rows of \u003cstdin\u003e\n┌───┬────────────────────────────────────┬─────────────────┐\n│ - │ name                               │ foundation_year │\n├───┼────────────────────────────────────┼─────────────────┤\n│ 0 │ Le Monde Numérique (Ouest France)  │ \u003cempty\u003e         │\n│ 1 │ Le Figaro                          │ 1826            │\n│ 2 │ Le journal de Saône-et-Loire       │ 1826            │\n│ 3 │ L'Indépendant                      │ 1846            │\n│ 4 │ Le Progrès                         │ 1859            │\n│ 5 │ La Dépêche du Midi                 │ 1870            │\n│ 6 │ Le Pélerin                         │ 1873            │\n│ 7 │ Dernières Nouvelles d'Alsace (DNA) │ 1877            │\n│ 8 │ La Croix                           │ 1883            │\n│ 9 │ Le Chasseur Francais               │ 1885            │\n│ … │ …                                  │ …               │\n└───┴────────────────────────────────────┴─────────────────┘\n```\n\n### Deduplicating the file on some column\n\n```bash\n# Some medias of our corpus have the same ids on mediacloud.org\nxan dedup -s mediacloud_ids medias.csv | xan count \u0026\u0026 xan count medias.csv\n```\n\n```\n457\n478\n```\n\nDeduplicating can also be done while sorting:\n\n```bash\nxan sort -s mediacloud_ids -u medias.csv\n```\n\n### Computing frequency tables\n\n```bash\nxan frequency -s edito medias.csv | xan view\n```\n\n```\nDisplaying 3 cols from 5 rows of \u003cstdin\u003e\n┌───┬───────┬────────────┬───────┐\n│ - │ field │ value      │ count │\n├───┼───────┼────────────┼───────┤\n│ 0 │ edito │ media      │ 423   │\n│ 1 │ edito │ individu   │ 30    │\n│ 2 │ edito │ plateforme │ 14    │\n│ 3 │ edito │ agrégateur │ 10    │\n│ 4 │ edito │ agence     │ 1     │\n└───┴───────┴────────────┴───────┘\n```\n\n### Printing a histogram\n\n```bash\nxan frequency -s edito medias.csv | xan hist\n```\n\n```\nHistogram for edito (bars: 5, sum: 478, max: 423):\n\nmedia      |423  88.49%|━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━|\nindividu   | 30   6.28%|━━━╸                                                  |\nplateforme | 14   2.93%|━╸                                                    |\nagrégateur | 10   2.09%|━╸                                                    |\nagence     |  1   0.21%|╸                                                     |\n```\n\n### Computing descriptive statistics\n\n```bash\nxan stats -s indegree,edito medias.csv | xan transpose | xan view -I\n```\n\n```\nDisplaying 2 cols from 14 rows of \u003cstdin\u003e\n┌─────────────┬───────────────────┬────────────┐\n│ field       │ indegree          │ edito      │\n├─────────────┼───────────────────┼────────────┤\n│ count       │ 463               │ 478        │\n│ count_empty │ 15                │ 0          │\n│ type        │ int               │ string     │\n│ types       │ int|empty         │ string     │\n│ sum         │ 25987             │ \u003cempty\u003e    │\n│ mean        │ 56.12742980561554 │ \u003cempty\u003e    │\n│ variance    │ 4234.530197929737 │ \u003cempty\u003e    │\n│ stddev      │ 65.07326792108829 │ \u003cempty\u003e    │\n│ min         │ 0                 │ \u003cempty\u003e    │\n│ max         │ 424               │ \u003cempty\u003e    │\n│ lex_first   │ 0                 │ agence     │\n│ lex_last    │ 99                │ plateforme │\n│ min_length  │ 0                 │ 5          │\n│ max_length  │ 3                 │ 11         │\n└─────────────┴───────────────────┴────────────┘\n```\n\n### Evaluating an expression to filter a file\n\n```bash\nxan filter 'batch \u003e 1' medias.csv | xan count\n```\n\n```\n130\n```\n\nTo access the expression language's [cheatsheet](./docs/moonblade/cheatsheet.md), run `xan help cheatsheet`. To display the full list of available [functions](./docs/moonblade/functions.md), run `xan help functions`.\n\n### Evaluating an expression to create a new column based on other ones\n\n```bash\nxan map 'fmt(\"{} ({})\", name, foundation_year) as key' medias.csv | xan select key | xan slice -l 10\n```\n\n```\nkey\nAcrimed.org (2002)\n24matins.fr (2006)\nActumag.info (2013)\n2012un-Nouveau-Paradigme.com (2012)\n24heuresactu.com (2010)\nAgoraVox (2005)\nAl-Kanz.org (2008)\nAlalumieredunouveaumonde.blogspot.com (2012)\nAllodocteurs.fr (2005)\nAlterinfo.net (2005)\n```\n\nTo access the expression language's [cheatsheet](./docs/moonblade/cheatsheet.md), run `xan help cheatsheet`. To display the full list of available [functions](./docs/moonblade/functions.md), run `xan help functions`.\n\n### Transform a column by evaluating an expression\n\n```bash\nxan transform name 'split(name, \".\") | first | upper' medias.csv | xan select name | xan slice -l 10\n```\n\n```\nname\nACRIMED\n24MATINS\nACTUMAG\n2012UN-NOUVEAU-PARADIGME\n24HEURESACTU\nAGORAVOX\nAL-KANZ\nALALUMIEREDUNOUVEAUMONDE\nALLODOCTEURS\nALTERINFO\n```\n\nTo access the expression language's [cheatsheet](./docs/moonblade/cheatsheet.md), run `xan help cheatsheet`. To display the full list of available [functions](./docs/moonblade/functions.md), run `xan help functions`.\n\n### Performing custom aggregation\n\n```bash\nxan agg 'sum(indegree) as total_indegree, mean(indegree) as mean_indegree' medias.csv | xan view -I\n```\n\n```\nDisplaying 1 col from 1 rows of \u003cstdin\u003e\n┌────────────────┬───────────────────┐\n│ total_indegree │ mean_indegree     │\n├────────────────┼───────────────────┤\n│ 25987          │ 56.12742980561554 │\n└────────────────┴───────────────────┘\n```\n\nTo access the expression language's [cheatsheet](./docs/moonblade/cheatsheet.md), run `xan help cheatsheet`. To display the full list of available [functions](./docs/moonblade/functions.md), run `xan help functions`. Finally, to display the list of available [aggregation functions](./docs/moonblade/aggs.md), run `xan help aggs`.\n\n### Grouping rows and performing per-group aggregation\n\n```bash\nxan groupby edito 'sum(indegree) as indegree' medias.csv | xan view -I\n```\n\n```\nDisplaying 1 col from 5 rows of \u003cstdin\u003e\n┌────────────┬──────────┐\n│ edito      │ indegree │\n├────────────┼──────────┤\n│ agence     │ 50       │\n│ agrégateur │ 459      │\n│ plateforme │ 658      │\n│ media      │ 24161    │\n│ individu   │ 659      │\n└────────────┴──────────┘\n```\n\nTo access the expression language's [cheatsheet](./docs/moonblade/cheatsheet.md), run `xan help cheatsheet`. To display the full list of available [functions](./docs/moonblade/functions.md), run `xan help functions`. Finally, to display the list of available [aggregation functions](./docs/moonblade/aggs.md), run `xan help aggs`.\n\n## Learning\n\nIf you speak French, [here](https://ceres.sorbonne-universite.fr/test_outil_xan/) is a quick rundown of the tool by our friends from [CERES](https://ceres.sorbonne-universite.fr/).\n\n*Documented use-cases*\n\n* [Merging frequency tables, three ways](./docs/cookbook/frequency_tables.md)\n* [Parsing and visualizing dates with xan](./docs/cookbook/dates.md)\n* [Joining files by URL prefixes](./docs/cookbook/urls.md)\n\nFor a sense of what can be achieved with `xan`, see this page summarizing a variety of complex but detailed pipelines that have been used in real-life by real people to solve their problems, using the tool: [PIPELINES](./docs/PIPELINES.md).\n\n## Available commands\n\n- [**help**](./docs/cmd/help.md): Get help regarding the expression language\n\n*Explore \u0026 visualize*\n\n- [**count (c)**](./docs/cmd/count.md): Count rows in file\n- [**headers (h)**](./docs/cmd/headers.md): Show header names\n- [**view (v)**](./docs/cmd/view.md): Preview a CSV file in a human-friendly way\n- [**flatten**](./docs/cmd/flatten.md): Display a flattened version of each row of a file\n- [**hist**](./docs/cmd/hist.md): Print a histogram with rows of CSV file as bars\n- [**plot**](./docs/cmd/plot.md): Draw a scatter plot or line chart\n- [**heatmap**](./docs/cmd/heatmap.md): Draw a heatmap of a CSV matrix\n- [**progress**](./docs/cmd/progress.md): Display a progress bar while reading CSV data\n\n*Search \u0026 filter*\n\n- [**search**](./docs/cmd/search.md): Search for (or replace) patterns in CSV data\n- [**grep**](./docs/cmd/grep.md): Coarse but fast filtering of CSV data\n- [**filter**](./docs/cmd/filter.md): Only keep some CSV rows based on an evaluated expression\n- [**head**](./docs/cmd/head.md): First rows of CSV file\n- [**tail**](./docs/cmd/tail.md): Last rows of CSV file\n- [**slice**](./docs/cmd/slice.md): Slice rows of CSV file\n- [**top**](./docs/cmd/top.md): Find top rows of a CSV file according to some column\n- [**sample**](./docs/cmd/sample.md): Randomly sample CSV data\n- [**bisect**](./docs/cmd/bisect.md): Binary search on sorted CSV data\n\n*Sort \u0026 deduplicate*\n\n- [**sort**](./docs/cmd/sort.md): Sort CSV data\n- [**dedup**](./docs/cmd/dedup.md): Deduplicate a CSV file\n- [**shuffle**](./docs/cmd/shuffle.md): Shuffle CSV data\n\n*Aggregate*\n\n- [**frequency (freq)**](./docs/cmd/frequency.md): Show frequency tables\n- [**groupby**](./docs/cmd/groupby.md): Aggregate data by groups of a CSV file\n- [**stats**](./docs/cmd/stats.md): Compute basic statistics\n- [**agg**](./docs/cmd/agg.md): Aggregate data from CSV file\n- [**bins**](./docs/cmd/bins.md): Dispatch numeric columns into bins\n- [**window**](./docs/cmd/window.md): Compute window aggregations (cumsum, rolling mean, lag etc.)\n\n*Combine multiple CSV files*\n\n- [**cat**](./docs/cmd/cat.md): Concatenate by row or column\n- [**join**](./docs/cmd/join.md): Join CSV files\n- [**fuzzy-join**](./docs/cmd/fuzzy-join.md): Join a CSV file with another containing patterns (e.g. regexes)\n- [**merge**](./docs/cmd/merge.md): Merge multiple similar already sorted CSV files\n\n*Add, transform, drop and move columns*\n\n- [**select**](./docs/cmd/select.md): Select columns from a CSV file\n- [**drop**](./docs/cmd/drop.md): Drop columns from a CSV file\n- [**map**](./docs/cmd/map.md): Create a new column by evaluating an expression on each CSV row\n- [**transform**](./docs/cmd/transform.md): Transform a column by evaluating an expression on each CSV row\n- [**enum**](./docs/cmd/enum.md): Enumerate CSV file by preprending an index column\n- [**flatmap**](./docs/cmd/flatmap.md): Emit one row per value yielded by an expression evaluated for each CSV row\n- [**fill**](./docs/cmd/fill.md): Fill empty cells\n- [**complete**](./docs/cmd/complete.md): Add missing rows in a column of contiguous values\n- [**blank**](./docs/cmd/blank.md): Blank down contiguous identical cell values\n- [**separate**](./docs/cmd/separate.md): Split a single column into multiple ones\n\n*Format, convert \u0026 recombobulate*\n\n- [**behead**](./docs/cmd/behead.md): Drop header from CSV file\n- [**rename**](./docs/cmd/rename.md): Rename columns of a CSV file\n- [**input**](./docs/cmd/input.md): Read unusually formatted CSV data\n- [**fixlengths**](./docs/cmd/fixlengths.md): Makes all rows have same length\n- [**fmt**](./docs/cmd/fmt.md): Format CSV output (change field delimiter)\n- [**explode**](./docs/cmd/explode.md): Explode rows based on some column separator\n- [**implode**](./docs/cmd/implode.md): Collapse consecutive identical rows based on a diverging column\n- [**from**](./docs/cmd/from.md): Convert a variety of formats to CSV\n- [**to**](./docs/cmd/to.md): Convert a CSV file to a variety of data formats\n- [**scrape**](./docs/cmd/scrape.md): Scrape HTML into CSV data\n- [**reverse**](./docs/cmd/reverse.md): Reverse rows of CSV data\n\n*Transpose \u0026 pivot*\n\n- [**transpose (t)**](./docs/cmd/transpose.md): Transpose CSV file\n- [**pivot**](./docs/cmd/pivot.md): Split distinct values of a column into their own columns columns\n- [**unpivot**](./docs/cmd/unpivot.md): Stack multiple columns into fewer\n\n*Split a CSV file into multiple*\n\n- [**split**](./docs/cmd/split.md): Split CSV data into chunks\n- [**partition**](./docs/cmd/partition.md): Partition CSV data based on a column value\n\n*Parallelization*\n\n- [**parallel (p)**](./docs/cmd/parallel.md): Map-reduce-like parallel computation\n\n*Generate CSV files*\n\n- [**range**](./docs/cmd/range.md): Create a CSV file from a numerical range\n\n*Lexicometry \u0026 fuzzy matching*\n\n- [**tokenize**](./docs/cmd/tokenize.md): Tokenize a text column\n- [**vocab**](./docs/cmd/vocab.md): Build a vocabulary over tokenized documents\n\n*Matrix \u0026 network-related commands*\n\n- [**matrix**](./docs/cmd/matrix.md): Convert CSV data to matrix data\n- [**network**](./docs/cmd/network.md): Convert CSV data to network data\n\n*Debug*\n\n- [**eval**](./docs/cmd/eval.md): Evaluate/debug a single expression\n\n## General flags and IO model\n\n### Getting help\n\nIf you ever feel lost, each command has a `-h/--help` flag that will print the related documentation.\n\nIf you need help about the expression language, check out the `help` command itself:\n\n```bash\n# Help about help ;)\nxan help --help\n```\n\n### Regarding input \u0026 output formats\n\nAll `xan` commands expect a \"standard\" CSV file, e.g. comma-delimited, with proper double-quote escaping. This said, `xan` is also perfectly able to infer the delimiter from typical file extensions such as `.tsv`, `.tab`, `.psv`, `.ssv` or `.scsv`.\n\nIf you need to process a file with a custom delimiter, you can either use the `xan input` command or use the `-d/--delimiter` flag available with all commands.\n\nIf you need to output a custom CSV dialect (e.g. using `;` delimiters), feel free to use the `xan fmt` command.\n\nIf your CSV file has a varying number of columns per row, use the `xan fixlengths` command before piping into other commands as `xan` expects well-behaved CSV data where rows all have the same number of columns.\n\nFinally, even if most `xan` commands won't even need to decode the file's bytes, some might still need to. In this case, `xan` will expect correctly formatted UTF-8 text. Please use `iconv` or other utils if you need to process other encodings such as `latin1` ahead of `xan`.\n\n### Working with headless CSV file\n\nEven if this is good practice to name your columns, some CSV file simply don't have headers. Most commands are able to deal with those file if you give the `-n/--no-headers` flag.\n\nNote that this flag always relates to the input, not the output. If for some reason you want to drop a CSV output's header row, use the `xan behead` command.\n\n### Regarding stdin\n\nBy default, all commands will try to read from stdin when the file path is not specified. This makes piping easy and comfortable as it respects typical unix standards. Some commands may have multiple inputs (`xan join`, for instance), in which case stdin is usually specifiable using the `-` character:\n\n```bash\n# First file given to join will be read from stdin\ncat file1.csv | xan join col1 - col2 file2.csv\n```\n\nNote that the command will also warn you when stdin cannot be read, in case you forgot to indicate the file's path.\n\n### Regarding stdout\n\nBy default, all commands will print their output to stdout (note that this output is usually buffered for performance reasons).\n\nIn addition, all commands expose a `-o/--output` flag that can be use to specify where to write the output. This can be useful if you do not want to or cannot use `\u003e` (typically in some Windows shells). In which case, `-` as a output path will mean forwarding to stdout also. This can be useful when scripting sometimes.\n\n### Supported file formats\n\n`xan` is able to process a large variety of CSV-adjacent data formats out-of-the box:\n\n- `.csv` files will be understood as comma-separated\n- `.tsv` \u0026 `.tab` files will be understood as tab-separated\n- `.scsv` \u0026 `.ssv` files will be understood as semicolon-separated\n- `.psv` files will be understood as pipe-separated\n- `.cdx` files (an index file [format](https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/) related to web archive) will be understood as space-separated and will have its magic bytes dropped\n- `.ndjson` \u0026 `.jsonl` files will be understood as tab-separated, headless, null-byte-quoted, so you can easily use them with `xan` commands (e.g. parsing or wrangling JSON data using the expression language to aggregate, even in parallel). If you need a more thorough conversion of newline-delimited JSON data, check out the `xan from -f ndjson` command instead.\n- `.vcf` files ([Variant Call Format](https://en.wikipedia.org/wiki/Variant_Call_Format)) from bioinformatics are supported out of the box. They will be stripped of their header data and considered as tab-delimited.\n- `.gtf` \u0026 `.gff2` files ([Gene Transfert Format](https://en.wikipedia.org/wiki/Gene_transfer_format)) from bioinformatics are supported out of the box. They will be stripped of their header data and considered as headless \u0026 tab-delimited.\n- `.sam` files ([Sequence Alignment Map](https://en.wikipedia.org/wiki/SAM_(file_format))) from bioinformatics are supported out of the box. They will be stripped of their header data and considered as headless \u0026 tab-delimited.\n- `.bed` files ([Browser Extensible Data](https://en.wikipedia.org/wiki/BED_(file_format))) from bioinformatics are supported out of the box. They will be stripped of their header data and considered as headless \u0026 tab-delimited.\n\nNote that more exotic delimiters can always be handled using the ubiquitous `-d, --delimiter` flag.\n\nSome additional formats (e.g. `.gff`, `.gff3`) are also supported but must first be normalized using the `xan input` command because their cells must be trimmed or because they have comment lines to be skipped.\n\nNote also that UTF-8 BOMs ara always stripped from the data when processed.\n\n### Compressed files\n\n`xan` is able to read gzipped files (having a `.gz` extension). It is also able to leverage `.gzi` indices (usually created through [`bgzip`](https://www.htslib.org/doc/bgzip.html)) when seeking is necessary (constant time reversing, parallelization etc.).\n\n`xan` is also able to read files compressed with [`Zstdandard`](https://github.com/facebook/zstd) (having a `.zst` extension).\n\n### Regarding color\n\nSome `xan` commands print ANSI colors in the terminal by default, typically `view`, `flatten`, etc.\n\nAll those commands have a standard `--color=(auto|always|never)` flag to tweak the colouring behavior if you need it (note that colors are not printed when commands are piped, by default).\n\nThey also respect typical environment variables related to ANSI colouring, such as `NO_COLOR`, `CLICOLOR` \u0026 `CLICOLOR_FORCE`, as documented [here](https://bixense.com/clicolors/).\n\n## Expression language reference\n\n- [Cheatsheet](./docs/moonblade/cheatsheet.md)\n- [Comprehensive list of functions \u0026 operators](./docs/moonblade/functions.md)\n- [Comprehensive list of aggregation functions](./docs/moonblade/aggs.md)\n- [Comprehensive list of window aggregation functions](./docs/moonblade/window.md)\n- [Scraping DSL](./docs/moonblade/scraping.md)\n\n## News\n\nFor news about the tool's evolutions feel free to read:\n\n1. the [changelog](CHANGELOG.md)\n2. the [xan zines](./docs/XANZINE.md)\n3. the [roadmap](https://github.com/medialab/xan/discussions/910)\n\nSee also blog posts related to the tool:\n\n* [Cursed engineering: jumping randomly through CSV files without hurting yourself](./docs/blog/csv_base_jumping.md)\n\n## How to cite?\n\n`xan` is published on [Zenodo](https://zenodo.org/) as [10.5281/zenodo.15310200](https://doi.org/10.5281/zenodo.15310200).\n\nYou can cite it thusly:\n\n\u003e Guillaume Plique, Béatrice Mazoyer, Laura Miguel, César Pichon, Anna Charles, \u0026 Julien Pontoire. (2025). xan, the CSV magician. (0.50.0). Zenodo. https://doi.org/10.5281/zenodo.15310200\n\n## Frequently Asked Questions\n\n### How to display a vertical bar chart?\n\nRotate your screen ;\\)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedialab%2Fxan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmedialab%2Fxan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmedialab%2Fxan/lists"}