{"id":13400171,"url":"https://github.com/vaexio/vaex","last_synced_at":"2026-04-02T01:30:35.231Z","repository":{"id":21215524,"uuid":"24528468","full_name":"vaexio/vaex","owner":"vaexio","description":"Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀","archived":false,"fork":false,"pushed_at":"2024-10-08T16:23:10.000Z","size":139418,"stargazers_count":8378,"open_issues_count":547,"forks_count":599,"subscribers_count":142,"default_branch":"master","last_synced_at":"2025-05-11T11:12:22.969Z","etag":null,"topics":["bigdata","data-science","dataframe","hdf5","machine-learning","machinelearning","memory-mapped-file","pyarrow","python","tabular-data","visualization"],"latest_commit_sha":null,"homepage":"https://vaex.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vaexio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.txt","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-09-27T09:44:42.000Z","updated_at":"2025-05-10T22:53:06.000Z","dependencies_parsed_at":"2023-09-27T03:36:43.884Z","dependency_job_id":"3b1ff7f0-41dd-4d10-94da-33a8dbcf7318","html_url":"https://github.com/vaexio/vaex","commit_stats":{"total_commits":3346,"total_committers":79,"mean_commits":42.35443037974684,"dds":"0.22952779438135085","last_synced_commit":"15245cf4332d4423ac58bd737aee27d911a1b252"},"previous_names":["maartenbreddels/vaex"],"tags_count":370,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vaexio%2Fvaex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vaexio%2Fvaex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vaexio%2Fvaex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vaexio%2Fvaex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vaexio","download_url":"https://codeload.github.com/vaexio/vaex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253554098,"owners_count":21926614,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","data-science","dataframe","hdf5","machine-learning","machinelearning","memory-mapped-file","pyarrow","python","tabular-data","visualization"],"created_at":"2024-07-30T19:00:49.228Z","updated_at":"2025-12-12T00:50:26.970Z","avatar_url":"https://github.com/vaexio.png","language":"Python","readme":"[![Supported Python Versions](https://img.shields.io/pypi/pyversions/vaex-core)](https://pypi.org/project/vaex-core/)\n[![Documentation](https://readthedocs.org/projects/vaex/badge/?version=latest)](https://docs.vaex.io)\n[![Slack](https://img.shields.io/badge/slack-chat-green.svg)](https://join.slack.com/t/vaexio/shared_invite/zt-shhxzf5i-Cf5n2LtkoYgUjOjbB3bGQQ)\n\n# What is Vaex?\n\nVaex is a high performance Python library for lazy **Out-of-Core DataFrames**\n(similar to Pandas), to visualize and explore big tabular datasets. It\ncalculates *statistics* such as mean, sum, count, standard deviation etc, on an\n*N-dimensional grid* for more than **a billion** (`10^9`) samples/rows **per\nsecond**. Visualization is done using **histograms**, **density plots** and **3d\nvolume rendering**, allowing interactive exploration of big data. Vaex uses\nmemory mapping, zero memory copy policy and lazy computations for best\nperformance (no memory wasted).\n\n# Installing\nWith pip:\n```\n$ pip install vaex\n```\nOr conda:\n```\n$ conda install -c conda-forge vaex\n```\n\n[For more details, see the documentation](https://docs.vaex.io/en/latest/installing.html)\n\n# Key features\n## Instant opening of Huge data files (memory mapping)\n[HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) and [Apache Arrow](https://arrow.apache.org/) supported.\n\n![opening1a](https://user-images.githubusercontent.com/1765949/82818563-31c1e200-9e9f-11ea-9ee0-0a8c1994cdc9.png)\n\n\n![opening1b](https://user-images.githubusercontent.com/1765949/82820352-49e73080-9ea2-11ea-9153-d73aa399d329.png)\n\n[Read the documentation on how to efficiently convert your data](https://docs.vaex.io/en/latest/example_io.html) from CSV files, Pandas DataFrames, or other sources.\n\n\nLazy streaming from S3 supported in combination with memory mapping.\n\n![opening1c](https://user-images.githubusercontent.com/1765949/82820516-a21e3280-9ea2-11ea-948b-07df26c4b5d3.png)\n\n\n## Expression system\nDon't waste memory or time with feature engineering, we (lazily) transform your data when needed.\n\n\n![expression](https://user-images.githubusercontent.com/1765949/82818733-70f03300-9e9f-11ea-80b0-ab28e7950b5c.png)\n\n\n\n## Out-of-core DataFrame\nFiltering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk, and will be streamed only when needed. Delay the time before you need a cluster.\n\n\n![occ-animated](https://user-images.githubusercontent.com/1765949/82821111-c6c6da00-9ea3-11ea-9f9e-498de8133cc2.gif)\n\n## Fast groupby / aggregations\nVaex implements parallelized, highly performant `groupby` operations, especially when using categories (\u003e1 billion/second).\n\n\n![groupby](https://user-images.githubusercontent.com/1765949/82818807-97ae6980-9e9f-11ea-8820-41dd4441057a.png)\n\n\n## Fast and efficient join\nVaex doesn't copy/materialize the 'right' table when joining, saving gigabytes of memory. With subsecond joining on a billion rows, it's pretty fast!\n\n![join](https://user-images.githubusercontent.com/1765949/82818840-a268fe80-9e9f-11ea-8ba2-6a6d52c4af88.png)\n\n## More features\n\n * Remote DataFrames (documentation coming soon)\n * Integration into [Jupyter and Voila for interactive notebooks and dashboards](https://vaex.readthedocs.io/en/latest/tutorial_jupyter.html)\n * [Machine Learning without (explicit) pipelines](https://vaex.readthedocs.io/en/latest/tutorial_ml.html)\n\n\n## Contributing\n\nSee [contributing](CONTRIBUTING.md) page.\n\n## Slack\n\nJoin the discussion in our [Slack](https://join.slack.com/t/vaexio/shared_invite/zt-shhxzf5i-Cf5n2LtkoYgUjOjbB3bGQQ) channel!\n\n# Learn more about Vaex\n * Articles\n   * [Beyond Pandas: Spark, Dask, Vaex and other big data technologies battling head to head](https://towardsdatascience.com/beyond-pandas-spark-dask-vaex-and-other-big-data-technologies-battling-head-to-head-a453a1f8cc13) (includes benchmarks)\n   * [7 reasons why I love Vaex for data science](https://towardsdatascience.com/7-reasons-why-i-love-vaex-for-data-science-99008bc8044b) (tips and trics)\n   * [ML impossible: Train 1 billion samples in 5 minutes on your laptop using Vaex and Scikit-Learn](https://towardsdatascience.com/ml-impossible-train-a-1-billion-sample-model-in-20-minutes-with-vaex-and-scikit-learn-on-your-9e2968e6f385)\n   * [How to analyse 100 GB of data on your laptop with Python](https://towardsdatascience.com/how-to-analyse-100s-of-gbs-of-data-on-your-laptop-with-python-f83363dda94)\n   * [Flying high with Vaex: analysis of over 30 years of flight data in Python](https://towardsdatascience.com/https-medium-com-jovan-veljanoski-flying-high-with-vaex-analysis-of-over-30-years-of-flight-data-in-python-b224825a6d56)\n   * [Vaex: A DataFrame with super strings - Speed up your text processing up to a 1000x\n](https://towardsdatascience.com/vaex-a-dataframe-with-super-strings-789b92e8d861)\n   * [Vaex: Out of Core Dataframes for Python and Fast Visualization - 1 billion row datasets on your laptop](https://towardsdatascience.com/vaex-out-of-core-dataframes-for-python-and-fast-visualization-12c102db044a)\n\n * [Follow our tutorials](https://docs.vaex.io/en/latest/tutorials.html)\n * Watch our more recent talks:\n   * [PyData London 2019](https://www.youtube.com/watch?v=2Tt0i823-ec)\n   * [SciPy 2019](https://www.youtube.com/watch?v=ELtjRdPT8is)\n * Contact us for data science solutions, training, or enterprise support at https://vaex.io/\n","funding_links":[],"categories":["Data Manipulation","Basic Components","Software","Python","Data Processing","🐍 Python","数据容器和结构","其他_机器学习与深度学习","Curated List","Computation and Communication Optimisation","HarmonyOS","Uncategorized","📚 فهرست","📋 Contents","**Programming (learning)**","Data Containers \u0026 Dataframes","Libraries"],"sub_categories":["Data Frames","Alternative libraries","Trends","Data Representation","Useful Python Tools for Data Analysis","Data Tools","General-Purpose Machine Learning","Windows Manager","Uncategorized","کتابخانه هاي تحليل داده","🧬 1. Core Frameworks \u0026 Libraries","**Developer\\'s Tools**"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvaexio%2Fvaex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvaexio%2Fvaex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvaexio%2Fvaex/lists"}