{"id":17983652,"url":"https://github.com/mchav/dataframe","last_synced_at":"2025-08-16T23:31:22.831Z","repository":{"id":224270622,"uuid":"762867802","full_name":"mchav/dataframe","owner":"mchav","description":"An intuitive, dynamically-typed DataFrame library.","archived":false,"fork":false,"pushed_at":"2024-12-10T23:50:01.000Z","size":38798,"stargazers_count":19,"open_issues_count":8,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-11T00:26:51.703Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mchav.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-24T22:53:26.000Z","updated_at":"2024-12-10T23:50:04.000Z","dependencies_parsed_at":"2024-02-24T23:33:46.474Z","dependency_job_id":"ba7fe158-df66-42e9-8989-f8917e6fd47a","html_url":"https://github.com/mchav/dataframe","commit_stats":null,"previous_names":["mchav/dataframe"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchav%2Fdataframe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchav%2Fdataframe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchav%2Fdataframe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchav%2Fdataframe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mchav","download_url":"https://codeload.github.com/mchav/dataframe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230066588,"owners_count":18167539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T18:17:57.316Z","updated_at":"2025-08-16T23:31:22.822Z","avatar_url":"https://github.com/mchav.png","language":"Haskell","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  \u003ca href=\"https://dataframe.readthedocs.io/en/latest/\"\u003e\n    \u003cimg width=\"100\" height=\"100\" src=\"https://raw.githubusercontent.com/mchav/dataframe/master/docs/_static/haskell-logo.svg\" alt=\"dataframe logo\"\u003e\n  \u003c/a\u003e\n\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://hackage.haskell.org/package/dataframe-0.2.0.2\"\u003e\n    \u003cimg src=\"https://img.shields.io/hackage/v/dataframe\" alt=\"hackage Latest Release\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/mchav/dataframe/actions/workflows/haskel-ci.yml\"\u003e\n    \u003cimg src=\"https://github.com/mchav/dataframe/actions/workflows/haskell-ci.yml/badge.svg\" alt=\"C/I\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://dataframe.readthedocs.io/en/latest/\"\u003eUser guide\u003c/a\u003e\n  |\n  \u003ca href=\"https://discord.gg/XJE5wKT2kb\"\u003eDiscord\u003c/a\u003e\n\u003c/p\u003e\n\n# DataFrame\n\nA fast, safe, and intuitive DataFrame library.\n\n## Why use this DataFrame library?\n\n* Encourages concise, declarative, and composable data pipelines.\n* Static typing makes code easier to reason about and catches many bugs at compile time—before your code ever runs.\n* Delivers high performance thanks to Haskell’s optimizing compiler and efficient memory model.\n* Designed for interactivity: expressive syntax, helpful error messages, and sensible defaults.\n* Works seamlessly in both command-line and notebook environments—great for exploration and scripting alike.\n\n## Example usage\n\n### Interactive environment\n![Screencast of usage in GHCI](./static/example.gif)\n\nKey features in example:\n* Intuitive, SQL-like API to get from data to insights.\n* Create typed, completion-ready references to columns in a dataframe using `:exposeColumns`\n* Type-safe column transformations for faster and safer exploration.\n* Fluid, chaining API that makes code easy to reason about.\n\n### Standalone script example\n```haskell\n-- Useful Haskell extensions.\n{-# LANGUAGE OverloadedStrings #-} -- Allow string literal to be interpreted as any other string type.\n{-# LANGUAGE TypeApplications #-} -- Convenience syntax for specifiying the type `sum a b :: Int` vs `sum @Int a b'. \n\nimport qualified DataFrame as D -- import for general functionality.\nimport qualified DataFrame.Functions as F -- import for column expressions.\n\nimport DataFrame ((|\u003e)) -- import chaining operator with unqualified.\n\nmain :: IO ()\nmain = do\n    df \u003c- D.readTsv \"./data/chipotle.tsv\"\n    let quantity = F.col \"quantity\" :: D.Expr Int -- A typed reference to a column.\n    print (df\n      |\u003e D.select [\"item_name\", \"quantity\"]\n      |\u003e D.groupBy [\"item_name\"]\n      |\u003e D.aggregate [ (F.sum quantity)     `F.as` \"sum_quantity\"\n                     , (F.mean quantity)    `F.as` \"mean_quantity\"\n                     , (F.maximum quantity) `F.as` \"maximum_quantity\"\n                     ]\n      |\u003e D.sortBy D.Descending [\"sum_quantity\"]\n      |\u003e D.take 10)\n\n```\n\nOutput:\n\n```\n------------------------------------------------------------------------------------------\nindex |          item_name           | sum_quantity |    mean_quanity    | maximum_quanity\n------|------------------------------|--------------|--------------------|----------------\n Int  |             Text             |     Int      |       Double       |       Int      \n------|------------------------------|--------------|--------------------|----------------\n0     | Chicken Bowl                 | 761          | 1.0482093663911847 | 3              \n1     | Chicken Burrito              | 591          | 1.0687160940325497 | 4              \n2     | Chips and Guacamole          | 506          | 1.0563674321503131 | 4              \n3     | Steak Burrito                | 386          | 1.048913043478261  | 3              \n4     | Canned Soft Drink            | 351          | 1.1661129568106312 | 4              \n5     | Chips                        | 230          | 1.0900473933649288 | 3              \n6     | Steak Bowl                   | 221          | 1.04739336492891   | 3              \n7     | Bottled Water                | 211          | 1.3024691358024691 | 10             \n8     | Chips and Fresh Tomato Salsa | 130          | 1.1818181818181819 | 15             \n9     | Canned Soda                  | 126          | 1.2115384615384615 | 4 \n```\n\nFull example in `./examples` folder using many of the constructs in the API.\n\n## Installing\n\n### Jupyter notebook\n* We have a [hosted version of the Jupyter notebook](https://ulwazi-exh9dbh2exbzgbc9.westus-01.azurewebsites.net/lab) on azure sites. This is hosted on Azure's free tier so it can only support 3 or 4 kernels at a time.\n* To get started quickly, use the Dockerfile in the [ihaskell-dataframe](https://github.com/mchav/ihaskell-dataframe) to build and run an image with dataframe integration.\n* For a preview check out the [California Housing](https://github.com/mchav/dataframe/blob/main/docs/California%20Housing.ipynb) notebook.\n\n### CLI\n* Run the installation script `curl '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/install.sh | sh`\n* Download the run script with: `curl --output dataframe \"https://raw.githubusercontent.com/mchav/dataframe/refs/heads/main/scripts/dataframe.sh\"`\n* Make the script executable: `chmod +x dataframe`\n* Add the script your path: `export PATH=$PATH:./dataframe`\n* Run the script with: `dataframe`\n\n\n## What is exploratory data analysis?\nWe provide a primer [here](https://github.com/mchav/dataframe/blob/main/docs/exploratory_data_analysis_primer.md) and show how to do some common analyses.\n\n## Coming from other dataframe libraries\nFamiliar with another dataframe library? Get started:\n* [Coming from Pandas](https://github.com/mchav/dataframe/blob/main/docs/coming_from_pandas.md)\n* [Coming from Polars](https://github.com/mchav/dataframe/blob/main/docs/coming_from_polars.md)\n* [Coming from dplyr](https://github.com/mchav/dataframe/blob/main/docs/coming_from_dplyr.md)\n\n## Supported input formats\n* CSV\n* Apache Parquet (still buggy and experimental)\n\n## Future work\n* Apache arrow compatability\n* Integration with common data formats (currently only supports CSV)\n* Support windowed plotting (currently only supports ASCII plots)\n* Host the whole library + Jupyter lab on Azure with auth and isolation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchav%2Fdataframe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmchav%2Fdataframe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchav%2Fdataframe/lists"}