{"id":13513812,"url":"https://github.com/elixir-explorer/explorer","last_synced_at":"2025-05-14T14:07:57.292Z","repository":{"id":36972767,"uuid":"388683644","full_name":"elixir-explorer/explorer","owner":"elixir-explorer","description":"Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir","archived":false,"fork":false,"pushed_at":"2025-04-20T16:11:36.000Z","size":3677,"stargazers_count":1196,"open_issues_count":32,"forks_count":129,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-05-13T00:41:28.500Z","etag":null,"topics":["data-science","dataframes","elixir","rust"],"latest_commit_sha":null,"homepage":"https://hexdocs.pm/explorer","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elixir-explorer.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-23T05:05:39.000Z","updated_at":"2025-05-10T16:23:09.000Z","dependencies_parsed_at":"2024-01-22T22:32:17.670Z","dependency_job_id":"428a28db-ef6d-4cc6-b2ab-7c9cb10b5e88","html_url":"https://github.com/elixir-explorer/explorer","commit_stats":null,"previous_names":["elixir-explorer/explorer","elixir-nx/explorer"],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-explorer%2Fexplorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-explorer%2Fexplorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-explorer%2Fexplorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elixir-explorer%2Fexplorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elixir-explorer","download_url":"https://codeload.github.com/elixir-explorer/explorer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254160141,"owners_count":22024567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dataframes","elixir","rust"],"created_at":"2024-08-01T05:00:38.150Z","updated_at":"2025-05-14T14:07:57.251Z","avatar_url":"https://github.com/elixir-explorer.png","language":"Elixir","readme":"\u003ch1\u003e\u003cimg src=\"explorer.png\" alt=\"Explorer\"\u003e\u003c/h1\u003e\n\n![CI](https://github.com/elixir-nx/explorer/actions/workflows/ci.yml/badge.svg)\n[![Documentation](http://img.shields.io/badge/hex.pm-docs-green.svg?style=flat)](https://hexdocs.pm/explorer)\n[![Package](https://img.shields.io/hexpm/v/explorer.svg)](https://hex.pm/packages/explorer)\n\n\u003c!-- MDOC --\u003e\n\nExplorer brings series (one-dimensional) and dataframes (two-dimensional) for fast\ndata exploration to Elixir.\n\n## Features and design\n\nExplorer high-level features are:\n\n- Simply typed series: `:binary`, `:boolean`, `:category`, `:date`, `:datetime`,\n  `:duration`, floats of 32 and 64 bits (`{:f, size}`), integers of 8, 16, 32\n  and 64 bits (`{:s, size}`, `{:u, size}`), `:null`, `:string`, `:time`, `:list`,\n  and `:struct`.\n\n- A powerful but constrained and opinionated API, so you spend less time looking\n  for the right function and more time doing data manipulation.\n\n- Support for CSV, Parquet, NDJSON, and Arrow IPC formats\n\n- Integration with external databases via [ADBC](https://github.com/elixir-explorer/adbc)\n  and direct connection to file storages such as S3\n\n- Pluggable backends, providing a uniform API whether you're working in-memory\n  or (forthcoming) on remote databases or even Spark dataframes.\n\n- The first (and default) backend is based on NIF bindings to the blazing-fast\n  [polars](https://docs.rs/polars) library.\n\nThe API is heavily influenced by [Tidy Data](https://vita.had.co.nz/papers/tidy-data.pdf)\nand borrows much of its design from [dplyr](https://dplyr.tidyverse.org). The philosophy\nis heavily influenced by this passage from `dplyr`'s documentation:\n\n\u003e - By constraining your options, it helps you think about your data manipulation\n\u003e   challenges.\n\u003e\n\u003e - It provides simple “verbs”, functions that correspond to the most common data\n\u003e   manipulation tasks, to help you translate your thoughts into code.\n\u003e\n\u003e - It uses efficient backends, so you spend less time waiting for the computer.\n\nThe aim here isn't to have the fastest dataframe library around (though it certainly\nhelps that [we're building on Polars, one of the fastest](https://h2oai.github.io/db-benchmark/)).\nInstead, we're aiming to bridge the best of many worlds:\n\n- the elegance of `dplyr`\n- the speed of `polars`\n- the joy of Elixir\n\nThat means you can expect the guiding principles to be 'Elixir-ish'. For example,\nyou won't see the underlying data mutated, even if that's the most efficient implementation.\nExplorer functions will always return a new dataframe or series.\n\n## Getting started\n\nInside an Elixir script or [Livebook](https://livebook.dev):\n\n```elixir\nMix.install([\n  {:explorer, \"~\u003e 0.10.0\"}\n])\n```\n\nOr in the `mix.exs` file of your application:\n\n```elixir\ndef deps do\n  [\n    {:explorer, \"~\u003e 0.10.0\"}\n  ]\nend\n```\n\nExplorer will download a precompiled version of its native code upon installation. You can force a local build by setting the environment variable `EXPLORER_BUILD=1` and including `:rustler` as a dependency:\n\n```elixir\n  {:explorer, \"~\u003e 0.10.0\", system_env: %{\"EXPLORER_BUILD\" =\u003e \"1\"}},\n  {:rustler, \"\u003e= 0.0.0\"}\n```\n\nIf necessary, clean up before rebuilding with `mix deps.clean explorer`.\n\n## A glimpse of the API\n\nWe have two ways to represent data with Explorer:\n\n- using a series, that is similar to a list, but is guaranteed to contain items\n  of one data type only - or one *dtype* for short. Notice that nil values are\n  permitted in series of any dtype.\n\n- using a dataframe, that is just a way to represent one or more series together,\n  and work with them as a whole. The only restriction is that all the series share\n  the same size.\n\nA series can be created from a list:\n\n```elixir\nfruits = Explorer.Series.from_list([\"apple\", \"mango\", \"banana\", \"orange\"])\n```\n\nYour newly created series is going to look like:\n\n```\n#Explorer.Series\u003c\n  Polars[4]\n  string [\"apple\", \"mango\", \"banana\", \"orange\"]\n\u003e\n```\n\nAnd you can, for example, sort that series:\n\n```elixir\nExplorer.Series.sort(fruits)\n```\n\nResulting in the following:\n\n```\n#Explorer.Series\u003c\n  Polars[4]\n  string [\"apple\", \"banana\", \"mango\", \"orange\"]\n\u003e\n```\n\n### Dataframes\n\nDataframes can be created in two ways:\n\n- by reading from files or memory using the\n  [IO functions](https://hexdocs.pm/explorer/Explorer.DataFrame.html#module-io-operations).\n  This is by far the most common way to load dataframes in Explorer.\n  We accept Parquet, IPC, CSV, and NDJSON files.\n\n- by using the `Explorer.DataFrame.new/2` function, that is neat for small experiments.\n  We are going to use this function here.\n\nYou can pass either series or lists to it:\n\n```elixir\nmountains = Explorer.DataFrame.new(name: [\"Everest\", \"K2\", \"Aconcagua\"], elevation: [8848, 8611, 6962])\n```\n\nYour dataframe is going to look like this:\n\n```\n#Explorer.DataFrame\u003c\n  Polars[3 x 2]\n  name string [\"Everest\", \"K2\", \"Aconcagua\"]\n  elevation s64 [8848, 8611, 6962]\n\u003e\n```\n\nIt's also possible to see a dataframe like a table, using the `Explorer.DataFrame.print/2`\nfunction:\n\n```elixir\nExplorer.DataFrame.print(mountains)\n```\n\nPrints:\n\n```\n+-------------------------------------------+\n| Explorer DataFrame: [rows: 3, columns: 2] |\n+---------------------+---------------------+\n|        name         |      elevation      |\n|      \u003cstring\u003e       |        \u003cs64\u003e        |\n+=====================+=====================+\n| Everest             | 8848                |\n+---------------------+---------------------+\n| K2                  | 8611                |\n+---------------------+---------------------+\n| Aconcagua           | 6962                |\n+---------------------+---------------------+\n```\n\nAnd now I want to show you how to filter our dataframe. But first, let's require\nthe `Explorer.DataFrame` module and give a short name to it:\n\n```elixir\nrequire Explorer.DataFrame, as: DF\n```\n\nThe \"require\" is needed to load the macro features of that module.\nWe give it a shorter name to simplify our examples.\n\nNow let's go to the filter. I want to filter the mountains that are above\nthe mean elevation in our dataframe:\n\n```elixir\nDF.filter(mountains, elevation \u003e mean(elevation))\n```\n\nYou can see that we can refer to the columns using their names, and use functions\nwithout defining them. This is possible due to the powerful `Explorer.Query` features,\nand it's the main reason we need to \"require\" the `Explorer.DataFrame` module.\n\nThe result is going to look like this:\n\n```\n#Explorer.DataFrame\u003c\n  Polars[2 x 2]\n  name string [\"Everest\", \"K2\"]\n  elevation s64 [8848, 8611]\n\u003e\n```\n\nThere is an extensive guide that you can play with Livebook:\n[Ten Minutes to Explorer](https://hexdocs.pm/explorer/exploring_explorer.html)\n\nYou can also check the `Explorer.DataFrame` and `Explorer.Series` docs for further\ndetails.\n\n\u003c!-- MDOC --\u003e\n\n## Contributing\n\nExplorer uses Rust for its default backend implementation, and while Rust is not\nnecessary to use Explorer as a package, you need Rust tooling installed on your\nmachine if you want to compile from source, which is the case when contributing\nto Explorer.\n\nWe require Rust Nightly, which can be installed with [Rustup](https://rust-lang.github.io/rustup/installation/index.html).\nIf you already have Rustup and a recent version of Cargo installed, then the correct version of\nRust is going to be installed in the first compilation of the project. Otherwise, you can manually\ninstall the correct version:\n\n```sh\nrustup toolchain install nightly-2024-07-26\n```\n\nYou can also use [asdf](https://asdf-vm.com/):\n\n```sh\nasdf install rust nightly-2024-07-26\n```\n\nIt's possible that you may need to install [`CMake`](https://cmake.org/) in order to build the project,\nif that is not already installed.\n\nOnce you have made your changes, run `mix ci`, to lint and format both Elixir\nand Rust code.\n\nOur integration tests require the [AWS CLI to be installed](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html),\nand also a container engine that can be [Podman](https://podman.io) or [Docker](https://docker.com).\n\nOnce these dependencies are installed, you need to run the `mix localstack.setup` command,\nand then run the cloud integration tests with `mix test --only cloud_integration`.\n\nJust to recap, here is the combo of commands you need to run:\n\n```sh\nmix ci\nmix localstack.setup\nmix test --only cloud_integration\n```\n\n## Precompilation\n\nExplorer ships with the NIF code precompiled for the most popular architectures out there.\nWe support the following:\n\n- `aarch64-apple-darwin` - MacOS running on ARM 64 bits CPUs.\n- `aarch64-unknown-linux-gnu` - Linux running on ARM 64 bits CPUs, compiled with GCC.\n- `aarch64-unknown-linux-musl` - Linux running on ARM 64 bits CPUs, compiled with Musl.\n- `x86_64-apple-darwin` - MacOS running on Intel/AMD 64 bits CPUs.\n- `x86_64-pc-windows-msvc` - Windows running on Intel/AMD 64 bits CPUs, compiled with Visual C++.\n- `x86_64-pc-windows-gnu` - Windows running on Intel/AMD 64 bits CPUs, compiled with GCC.\n- `x86_64-unknown-linux-gnu` - Linux running on Intel/AMD 64 bits CPUs, compiled with GCC.\n- `x86_64-unknown-linux-musl` - Linux running on Intel/AMD 64 bits CPUs, compiled with Musl.\n- `x86_64-unknown-freebsd` - FreeBSD running on Intel/AMD 64 bits.\n\nThis means that Explorer is going to work without the need to compile it from source.\n\nThis currently **only works for Hex releases**. For more information on how it works, please\ncheck the [RustlerPrecompiled project](https://hexdocs.pm/rustler_precompiled).\n\n### Legacy CPUs\n\nWe ship some of the precompiled artifacts with modern CPU features enabled by default. But in\ncase your computer is not compatible with them, you can set an application environment that is\ngoing to be read at compile time, enabling the legacy variants of artifacts.\n\n```elixir\nconfig :explorer, use_legacy_artifacts: true\n```\n\nIf you see the error message \"Illegal instruction\" after your project compiles, you need to\nenable the legacy artifacts.\n\n### Features disabled\n\nSome of the features cannot be compiled to some targets, because one of the dependencies\ndon't work on it.\n\nThis is the case for the **NDJSON** reads and writes, that don't work for the RISCV target.\nWe also disable the AWS S3 reads and writes for the RISCV target, because one of the dependencies\nof `ObjectStore` does not compile on it.\n\n## Sponsors\n\n\u003ca href=\"https://amplified.ai\"\u003e\u003cimg src=\"sponsors/amplified.png\" width=100 alt=\"Amplified\"\u003e\u003c/a\u003e\n","funding_links":[],"categories":["Elixir","Data Ingestion \u0026 ETL","Core Tools"],"sub_categories":["How to Join"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felixir-explorer%2Fexplorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felixir-explorer%2Fexplorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felixir-explorer%2Fexplorer/lists"}