Projects in Awesome Lists tagged with dataframe
A curated list of projects in awesome lists tagged with dataframe .
https://github.com/pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
arrow dataframe dataframe-library dataframes out-of-core polars python rust
Last synced: 15 Dec 2025
https://github.com/kanaries/pygwalker
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
data-analysis data-exploration dataframe matplotlib pandas plotly tableau tableau-alternative visualization
Last synced: 09 Sep 2025
https://github.com/Kanaries/pygwalker
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
data-analysis data-exploration dataframe matplotlib pandas plotly tableau tableau-alternative visualization
Last synced: 26 Mar 2025
https://github.com/modin-project/modin
Modin: Scale your Pandas workflows by changing a single line of code
analytics data-science dataframe datascience distributed modin pandas python sql
Last synced: 11 May 2025
https://github.com/rapidsai/cudf
cuDF - GPU DataFrame Library
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Last synced: 05 Feb 2026
https://github.com/vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization
Last synced: 12 Dec 2025
https://github.com/apache/datafusion
Apache DataFusion SQL Query Engine
arrow big-data dataframe datafusion olap python query-engine rust sql
Last synced: 12 Dec 2025
https://github.com/haifengl/smile
Statistical Machine Intelligence & Learning Engine
classification clustering computer-algebra-system computer-vision data-science dataframe deep-learning genetic-algorithm interpolation linear-algebra llm machine-learning manifold-learning multidimensional-scaling nearest-neighbor-search nlp regression statistics visualization wavelet
Last synced: 08 Jan 2026
https://github.com/twopirllc/pandas-ta
Technical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 150+ Indicators
dataframe finance fundamental-analysis jupyter-notebook pandas pandas-dataframe-extension pandas-extension pandas-ta python3 stock-market technical technical-analysis technical-analysis-indicators technical-analysis-library technical-indicators trading trading-algorithms
Last synced: 12 May 2025
https://github.com/javascriptdata/danfojs
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors
Last synced: 14 May 2025
https://github.com/lk-geimfari/mimesis
Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.
data dataframe datascience dummy factory factory-boy fake fixtures generator json-generator mimesis mock pandas polars pytest-plugin python schema syntetic synthetic-data testing
Last synced: 28 Dec 2025
https://github.com/jtablesaw/tablesaw
Java dataframe and visualization library
chart data-analysis data-frame data-science data-visualization dataframe high-performance java java-dataframe machine-learning plotly plotting statistical-analysis statistics visualization
Last synced: 12 Jan 2026
https://github.com/databricks/koalas
Koalas: pandas API on Apache Spark
big-data data-science dataframe mlflow pandas pydata spark
Last synced: 13 May 2025
https://github.com/eventual-inc/daft
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
big-data data-engineering data-science dataframe distributed-computing machine-learning python rust
Last synced: 08 May 2025
https://github.com/hosseinmoein/dataframe
C++ DataFrame for statistical, financial, and ML analysis in modern C++
ai cpp data-analysis data-science dataframe financial-data-analysis financial-engineering heterogeneous-data large-data machine-learning multidimensional-data numerical-analysis pandas polars statistical statistical-analysis tensor tensorboard trading-algorithms trading-strategies
Last synced: 04 Sep 2025
https://github.com/mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
dask dataframe joblib lightgbm machine-learning numpy pandas python pytorch ray scikit-learn statsmodels tensor tensorflow xgboost
Last synced: 25 Apr 2025
https://github.com/Eventual-Inc/Daft
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
big-data data-engineering data-science dataframe distributed-computing machine-learning python rust
Last synced: 09 Apr 2025
https://github.com/hosseinmoein/DataFrame
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
ai cpp data-analysis data-science dataframe financial-data-analysis financial-engineering heterogeneous-data large-data machine-learning multidimensional-data numerical-analysis pandas polars statistical statistical-analysis tensor tensorboard trading-algorithms trading-strategies
Last synced: 15 Mar 2025
https://github.com/sngyai/sequoia
A股自动选股程序,实现了海龟交易法则、缠中说禅牛市买点,以及其他若干种技术形态
a-shares akshare dataframe pandas python ta-lib turtle-trade tushare
Last synced: 11 Apr 2025
https://github.com/sngyai/Sequoia
A股自动选股程序,实现了海龟交易法则、缠中说禅牛市买点,以及其他若干种技术形态
a-shares akshare dataframe pandas python ta-lib turtle-trade tushare
Last synced: 01 Apr 2025
https://github.com/approximatelabs/sketch
AI code-writing assistant that understands data content
ai codex copilot data data-science dataframe datasketch datasketches df ds gpt3 lambdaprompt pandas python sketches tabular-data
Last synced: 11 Mar 2026
https://github.com/alexhallam/tv
📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
cli column command-line command-line-tool csv csv-cat csv-column csv-pretty-print csv-viewer csv-visualization data-science dataframe datatable pretty-print pretty-printer rust tabular-data terminal tibble
Last synced: 13 May 2025
https://github.com/DAGWorks-Inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering
Last synced: 26 Mar 2025
https://github.com/man-group/arcticdb
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading
Last synced: 16 Feb 2026
https://github.com/apache/datafusion-ballista
Apache DataFusion Ballista Distributed Query Engine
arrow big-data dataframe distributed olap python query-engine rust sql
Last synced: 12 Dec 2025
https://github.com/man-group/ArcticDB
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading
Last synced: 12 Mar 2025
https://github.com/pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
cleaning-data data data-engineering dataframe hacktoberfest pandas pydata
Last synced: 18 Feb 2026
https://github.com/skrub-data/skrub
Machine learning with dataframes
data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dataframe dataframes dirty-data machine-learning
Last synced: 06 Jan 2026
https://github.com/rocketlaunchr/dataframe-go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics
Last synced: 15 May 2025
https://github.com/michaelchu/optopsy
A nimble options backtesting library for Python
algorithmic algorithmic-trading algorithmic-trading-engine algorithmic-trading-library backtest backtesting backtesting-frameworks backtesting-trading-strategies dataframe option-chain option-pricing option-strategies options options-framework options-spreads options-strategies options-trading trade-options trading
Last synced: 30 Jan 2026
https://github.com/comet-ml/kangas
🦘 Explore multimedia datasets at scale
data-analysis data-exploration dataframe datagrid machine-learning
Last synced: 14 May 2025
https://github.com/graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
apache-spark big-data connected-components dataframe dataframes graphs network-motif network-motifs networks spark
Last synced: 14 May 2025
https://github.com/redislabs/spark-redis
A connector for Spark that allows reading and writing to/from Redis cluster
Last synced: 14 May 2025
https://github.com/microsoft/mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 14 May 2025
https://github.com/Microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 14 Mar 2025
https://github.com/microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
apache-spark bigdata csharp dataframe dataset dstream eventhubs fsharp kafka-streaming mapreduce mobius near-real-time rdd spark spark-streaming streaming
Last synced: 08 Apr 2025
https://github.com/RedisLabs/spark-redis
A connector for Spark that allows reading and writing to/from Redis cluster
Last synced: 28 Mar 2025
https://github.com/kotlin/dataframe
Structured data processing in Kotlin
data-analysis data-science dataframe kotlin
Last synced: 04 Jul 2025
https://github.com/freqtrade/technical
Various indicators developed or collected for the Freqtrade
dataframe freqtrade technical-analysis trading
Last synced: 14 May 2025
https://github.com/Kotlin/dataframe
Structured data processing in Kotlin
data-analysis data-science dataframe kotlin
Last synced: 11 Apr 2025
https://github.com/stitchfix/hamilton
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix
Last synced: 29 Sep 2025
https://github.com/mrpowers-io/spark-daria
Essential Spark extensions and helper methods ✨😲
Last synced: 21 Feb 2026
https://github.com/pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
data data-science dataframe dataframes pandas pandas-dataframe pipeline
Last synced: 06 Mar 2026
https://github.com/techascent/tech.ml.dataset
A Clojure high performance data processing system
clojure csv dataframe datascience dataset etl-pipeline java machine-learning xlsx
Last synced: 15 May 2025
https://github.com/elastic/eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
big-data data-analysis dataframe dataframes eland elasticsearch etl lightgbm machine-learning pandas python scikit-learn time-series-forecasting
Last synced: 14 Apr 2025
https://github.com/dmnfarrell/pandastable
Table analysis in Tkinter using pandas DataFrames.
data-analysis dataframe pandas plotting scientific tkinter
Last synced: 14 May 2025
https://github.com/squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 16 May 2025
https://github.com/Squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 15 Mar 2025
https://github.com/axect/peroxide
Rust numeric library with high performance and friendly syntax
dataframe determinant interpolation jacobian linear-algebra matlab matrix numerical-analysis numerical-integration optimization ordinary-differential-equations peroxide r regression rust rust-numeric-library scientific-computing simd-openblas spline statistics
Last synced: 14 May 2025
https://github.com/Axect/Peroxide
Rust numeric library with high performance and friendly syntax
dataframe determinant interpolation jacobian linear-algebra matlab matrix numerical-analysis numerical-integration optimization ordinary-differential-equations peroxide r regression rust rust-numeric-library scientific-computing simd-openblas spline statistics
Last synced: 10 Apr 2025
https://github.com/ranaroussi/pystore
Fast data store for Pandas time-series data
dask database dataframe datastore pandas parquet timeseries
Last synced: 01 Apr 2025
https://github.com/Gmousse/dataframe-js
A javascript library providing a new data structure for datascientists and developpers
data data-frame dataframe datascience datastructures functional groupby javascript manipulation matrix sql sql-syntax
Last synced: 15 Mar 2025
https://github.com/xorq-labs/xorq
multi-engine batch transformation framework
arrow dataframe elt machine-learning multi-engine python sklearn sql
Last synced: 06 Mar 2026
https://github.com/firmai/pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
data-science data-structures dataframe functions pandas python snippets table tips
Last synced: 06 May 2025
https://github.com/tobgu/qframe
Immutable data frame for Go
data-frame data-science dataframe go golang immutable
Last synced: 04 Apr 2025
https://github.com/deepspace2/styleframe
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
data-frame dataframe excel openpyxl pandas
Last synced: 20 Jan 2026
https://github.com/DeepSpace2/StyleFrame
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
data-frame dataframe excel openpyxl pandas
Last synced: 19 Jul 2025
https://github.com/manzt/quak
a scalable data profiler
database dataframe jupyter python visualization
Last synced: 16 May 2025
https://github.com/tirthajyoti/spark-with-python
Fundamentals of Spark with Python (using PySpark), code examples
analytics apache apache-spark big-data database dataframe distributed-computing hadoop hdfs machine-learning map-reduce mlib parallel-computing pyspark python spark sql
Last synced: 05 Apr 2025
https://github.com/bluenote10/NimData
DataFrame API written in Nim, enabling fast out-of-core data processing
Last synced: 13 Apr 2025
https://github.com/bluenote10/nimdata
DataFrame API written in Nim, enabling fast out-of-core data processing
Last synced: 02 Nov 2025
https://github.com/tidyverse/duckplyr
A drop-in replacement for dplyr, powered by DuckDB for speed.
analytics dataframe dplyr duckdb performance r
Last synced: 05 Jul 2025
https://github.com/snowflakedb/snowpark-python
Snowflake Snowpark Python API
data-analytics data-engineering data-science dataframe python snowflake sql
Last synced: 14 May 2025
https://github.com/lifeomic/sparkflow
Easy to use library to bring Tensorflow on Apache Spark
apache-spark dataframe deep-learning lifeomic pipeline spark-ml tensorflow
Last synced: 04 Apr 2025
https://github.com/zero-one-group/geni
A Clojure dataframe library that runs on Spark
big-data clojure clojure-library clojure-repl data-engineering data-science dataframe distributed-computing high-performance-computing machine-learning parallel-computing spark
Last synced: 04 Apr 2025
https://github.com/scicloj/tablecloth
Dataset manipulation library built on the top of tech.ml.dataset
clojure dataframe dataset machinelearning
Last synced: 12 Apr 2025
https://github.com/tirthajyoti/design-of-experiment-python
Design-of-experiment (DOE) generator for science, engineering, and statistics
analytics dataframe design-of-experiments experiment factors latin-hypercube matrix random-generation science statistics
Last synced: 06 Apr 2025
https://github.com/alastairrushworth/inspectdf
🛠️ 📊 Tools for Exploring and Comparing Data Frames
comparison dataframe eda exploratory-data-analysis r rstats visualization
Last synced: 13 Apr 2025
https://github.com/kszucs/pandahouse
Pandas interface for Clickhouse database
Last synced: 10 Sep 2025
https://github.com/zavtech/morpheus-core
The foundational library of the Morpheus data science framework
data-analysis data-analytics dataframe dataframe-library datascience finance principal-component-analysis quantitative-finance regression regression-models statistical-analysis statistics
Last synced: 02 Apr 2025
https://github.com/ank0409/Ditching-Excel-for-Python
Functionalities in Excel translated to Python
dataframe eda excel exploratory-data-analysis machine-learning numpy pandas pivot-tables python tutorial vba
Last synced: 09 Apr 2025
https://github.com/alteryx/woodwork
Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
data-science dataframe dataframes evalml featuretools inference machine-learning nlp-primitives python semantic-tags typing woodwork
Last synced: 15 May 2025
https://github.com/abdenlab/oxbow
Oxbow makes genomic data ready for high-performance analytics.
apache-arrow bioinformatics data-science dataframe fair-data genomics multiomics ngs pandas polars python r rust-lang
Last synced: 07 Mar 2026
https://github.com/scinim/datamancer
A dataframe library with a dplyr like API
dataframe dplyr hacktoberfest nim nim-lang
Last synced: 06 Apr 2025
https://github.com/SciNim/Datamancer
A dataframe library with a dplyr like API
dataframe dplyr hacktoberfest nim nim-lang
Last synced: 08 May 2025
https://github.com/noahgift/rust-mlops-template
A work in progress to build out solutions in Rust for MLOPs
cli dataframe hugging huggingface mlops polars pytorch rust web
Last synced: 15 Jul 2025
https://github.com/scipp/scipp
Multi-dimensional data arrays with labeled dimensions
dataframe dataset python science
Last synced: 17 Jan 2026
https://github.com/Quantco/dataframely
A declarative, 🐻❄️-native data frame validation library.
Last synced: 28 Apr 2025
https://github.com/bertrandmartel/tableau-scraping
Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
dataframe pandas python r tableau web-scraping
Last synced: 29 Aug 2025
https://github.com/archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives
Last synced: 13 Apr 2025
https://github.com/dmnfarrell/tablexplore
Table analysis and plotting application written in PySide2/PyQt5
data-analysis data-science dataframe pandas plotting pyqt5 pyside2 python qt
Last synced: 01 Aug 2025
https://github.com/clojure-finance/clojask
Clojask is a Clojure data processing framework with parallel computing on larger-than-memory datasets
big-data clojure dataframe parallel-computing
Last synced: 07 May 2025
https://github.com/yash1994/dframcy
Dataframe Integration with spaCy.
dataframe pandas-dataframe python3 spacy spacy-extension spacy-pipeline
Last synced: 12 Apr 2025
https://github.com/tidypyverse/tidypandas
A grammar of data manipulation for pandas inspired by tidyverse
data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse
Last synced: 13 Mar 2026
https://github.com/jgperrin/net.jgp.labs.spark
Apache Spark examples exclusively in Java
data-ingestion dataframe ingestion java spark udf
Last synced: 16 Apr 2025
https://github.com/facultyai/lens
Summarise and explore Pandas DataFrames
dask data-exploration data-science data-visualisation dataframe pandas
Last synced: 14 Apr 2025
https://github.com/finos/ipyregulartable
High performance, editable, stylable datagrids in jupyter and jupyterlab
data-table data-visualization dataframe datagrid ipywidgets javascript jupyter jupyter-notebook jupyterlab jupyterlab-extension notebook python table typescript
Last synced: 28 Oct 2025
https://github.com/talegari/tidypandas
A grammar of data manipulation for pandas inspired by tidyverse
data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse
Last synced: 03 Mar 2025
https://github.com/ashvardanian/StringWars
Comparing performance-oriented string-processing libraries for substring search, multi-pattern matching, hashing, edit-distances, sketching, and sorting across CPUs and GPUs in Rust 🦀 and Python 🐍
benchmark bioinformatics database dataframe levenshtein-distance libc memchr polars rapids string string-search strstr substring-search
Last synced: 08 Oct 2025
https://github.com/ashvardanian/stringwars
Comparing performance-oriented string-processing libraries for substring search, multi-pattern matching, hashing, edit-distances, sketching, and sorting across CPUs and GPUs in Rust 🦀 and Python 🐍
benchmark bioinformatics database dataframe levenshtein-distance libc memchr polars rapids string string-search strstr substring-search
Last synced: 18 Jan 2026
https://github.com/CybercentreCanada/jupyterlab-sql-editor
A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino
auto-completion dataframe datagrid extension formatter ipython-magic json jupyterlab lsp nested-structures notebook schema sparksql sql syntax-highlighting trino vscode-extension
Last synced: 06 Mar 2025
https://github.com/cybercentrecanada/jupyterlab-sql-editor
A JupyterLab extension providing, SQL formatter, auto-completion, syntax highlighting, Spark SQL and Trino
auto-completion dataframe datagrid extension formatter ipython-magic json jupyterlab lsp nested-structures notebook schema sparksql sql syntax-highlighting trino vscode-extension
Last synced: 04 Apr 2025
https://github.com/nmandery/h3ron
Rust crates for the H3 geospatial indexing system
dataframe geospatial h3 ndarray rust
Last synced: 27 Apr 2025
https://github.com/mahmoudparsian/pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
algorithms big-data data data-abstractions data-science dataframe distributed-computing graphframes mapreduce monoid nosql partitioning pyspark pyspark-algorithms python rdd spark transformations
Last synced: 07 Apr 2025