Projects in Awesome Lists tagged with dataframes
A curated list of projects in awesome lists tagged with dataframes .
https://github.com/pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
arrow dataframe dataframe-library dataframes out-of-core polars python rust
Last synced: 15 Dec 2025
https://github.com/unionai-oss/pandera
A light-weight, flexible, and expressive statistical data testing library
assertions data-assertions data-check data-cleaning data-processing data-validation data-verification dataframe-schema dataframes hypothesis-testing pandas pandas-dataframe pandas-validation pandas-validator schema testing testing-tools validation
Last synced: 12 Dec 2025
https://github.com/tiledb-inc/tiledb
The Universal Storage Engine
arrays data-analysis data-science dataframes dense-data hdfs s3 s3-storage scientific-computing sparse-arrays sparse-data storage-engine tiledb
Last synced: 13 May 2025
https://github.com/TileDB-Inc/TileDB
The Universal Storage Engine
arrays data-analysis data-science dataframes dense-data hdfs s3 s3-storage scientific-computing sparse-arrays sparse-data storage-engine tiledb
Last synced: 28 Mar 2025
https://github.com/juliadata/dataframes.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 13 May 2025
https://github.com/JuliaData/DataFrames.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 11 Apr 2025
https://github.com/skrub-data/skrub
Machine learning with dataframes
data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dataframe dataframes dirty-data machine-learning
Last synced: 13 May 2025
https://github.com/rocketlaunchr/dataframe-go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics
Last synced: 15 May 2025
https://github.com/elixir-explorer/explorer
Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
data-science dataframes elixir rust
Last synced: 14 May 2025
https://github.com/graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
apache-spark big-data connected-components dataframe dataframes graphs network-motif network-motifs networks spark
Last synced: 14 May 2025
https://github.com/pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
data data-science dataframe dataframes pandas pandas-dataframe pipeline
Last synced: 04 Aug 2025
https://github.com/elastic/eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
big-data data-analysis dataframe dataframes eland elasticsearch etl lightgbm machine-learning pandas python scikit-learn time-series-forecasting
Last synced: 14 Apr 2025
https://github.com/capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark
Last synced: 14 May 2025
https://github.com/polyaxon/datatile
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 17 Aug 2025
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 12 Dec 2025
https://github.com/juliadata/dataframesmeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 14 May 2025
https://github.com/JuliaData/DataFramesMeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 15 Mar 2025
https://github.com/static-frame/static-frame
Immutable and statically-typeable DataFrames with runtime type and data validation
arrays dataframes immutable-collections immutable-data-structures python
Last synced: 14 Apr 2025
https://github.com/aiguofer/gspread-pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets
Last synced: 15 May 2025
https://github.com/rtosholdings/riptable
64bit multithreaded python data analytics tools for numpy arrays and datasets
Last synced: 14 Dec 2025
https://github.com/RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
azure csv data-science dataframes delta-lake hdfs json jsoniq lakehouse machine-learning nested parquet query query-engine s3 scale schemaless spark svm text
Last synced: 20 Nov 2025
https://github.com/rumbledb/rumble
⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
avro azure csv data-science dataframes hdfs json jsoniq machine-learning nested parquet query query-engine s3 scale schemaless spark svm text yaml
Last synced: 03 Aug 2025
https://github.com/mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
algorithms bigdata data data-abstractions data-algorithms data-transformation dataframes design design-patterns machine-learning mappers mapreduce monoid partitioning-algorithms pyspark python rdd reducers spark transformations
Last synced: 07 Apr 2025
https://github.com/open2c/bioframe
Genomic interval operations on Pandas DataFrames
bioinformatics dataframes genomic-intervals genomic-ranges genomics ngs-analysis numpy pandas python spatial-join
Last synced: 06 Oct 2025
https://github.com/alteryx/woodwork
Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
data-science dataframe dataframes evalml featuretools inference machine-learning nlp-primitives python semantic-tags typing woodwork
Last synced: 15 May 2025
https://github.com/datahaskell/dh-core
Functional data science
data-analysis data-mining data-science dataframes datahaskell datasets machine-learning numerical-methods
Last synced: 21 Oct 2025
https://github.com/DataHaskell/dh-core
Functional data science
data-analysis data-mining data-science dataframes datahaskell datasets machine-learning numerical-methods
Last synced: 26 Mar 2025
https://github.com/sl-solution/inmemorydatasets.jl
Multithreaded package for working with tabular data in Julia
data-manipulation data-wrangling dataframes dataset efficient high-performance in-memory join julia multithreaded tabular-data
Last synced: 17 Aug 2025
https://github.com/JuliaAcademy/DataFrames
Welcome to DataFrames.jl with Bogumił Kamiński
dataframes julia julia-language julialang learn-julia learn-to-code
Last synced: 16 May 2025
https://github.com/juliaacademy/dataframes
Welcome to DataFrames.jl with Bogumił Kamiński
dataframes julia julia-language julialang learn-julia learn-to-code
Last synced: 21 Aug 2025
https://github.com/zbrookle/dataframe_sql
A Python package that parses SQL and interprets it as methods that act upon existing pandas (or other types of) DataFrames that have been declared and registered
data dataframes pandas python sql
Last synced: 20 Aug 2025
https://github.com/red-data-tools/red_amber
A dataframe library for Rubyists.
apache-arrow dataframe dataframe-library dataframes ruby
Last synced: 11 Oct 2025
https://github.com/biodatageeks/polars-bio
Blazing-Fast Bioinformatic Operations on Python DataFrames
arrow bioinformatics dataframes datafusion genomic-intervals genomic-ranges genomics pandas polars rust-lang
Last synced: 29 Jul 2025
https://github.com/zbrookle/sql_to_ibis
A Python package that parses sql and converts it to ibis expressions
data databases dataframes etl hacktoberfest ibis sql
Last synced: 14 Apr 2025
https://github.com/hablapps/sparkoptics
Optics for Spark DataFrames
dataframe dataframes optics scala spark spark-sql
Last synced: 30 Jun 2025
https://github.com/juliagraphs/graphdataframebridge.jl
Tools for interoperability between DataFrame objects and LightGraphs and MetaGraphs objects
dataframes graphs hacktoberfest julia juliagraphs
Last synced: 20 Sep 2025
https://github.com/isarn/isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
aggregator apache-spark data-sketches data-sketching dataframe dataframes dataset datasets feature-importance pyspark python scala sketching-algorithm spark spark-ml t-digest udaf variable-importance
Last synced: 29 Oct 2025
https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial
:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.
data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial
Last synced: 16 Apr 2025
https://github.com/sl-solution/dlmreader.jl
High-performance delimited-file reader and writer for Julia
csv csv-files csv-import csv-parser dataframes dataset delimited-files delimiter fwf high-performance informat julia julia-language threaded
Last synced: 14 Apr 2025
https://github.com/ocramz/heidi
heidi : tidy data in Haskell
algebraic-data-types data-analysis data-mining data-science dataframe dataframe-library dataframes generic-programming generics tidy-data
Last synced: 14 Apr 2025
https://github.com/zgbjgg/jun
JUN - python pandas, plotly, seaborn support & dataframes manipulation over erlang
dataframes elixir erlang numpy pandas plotly python scipy seaborn
Last synced: 11 Apr 2025
https://github.com/scottlepp/go-duck
A Golang DuckDB library that doesn't require CGO
analytics dataframe dataframes duckdb golang grafana sql
Last synced: 07 May 2025
https://github.com/mohammadreza-mohammadi94/data-analysis-and-machine-learning-projects
A comprehensive collection of data analysis and machine learning projects, showcasing techniques and models for various data challenges. Dive in to explore code examples, analyses, and machine learning workflows.
data-analysis data-science dataframes deep-learning exploratory-data-analysis hyperparameter-tuning machine-learning machine-learning-algorithms pandas python scikit-learn visualization
Last synced: 06 Oct 2025
https://github.com/surister/datasaurus
Data Engineering framework written in Python based in Polars.
classes data dataframes datamodeling framework library orm polars python
Last synced: 06 Oct 2025
https://github.com/lungben/tableio.jl
A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.
arrow csv data data-science database dataframe dataframes excel jdf json-format parquet postgresql sqlite zip
Last synced: 12 Oct 2025
https://github.com/typedef-ai/fenic
Build reliable AI and agentic applications with DataFrames
agents ai arrow dataframe-library dataframes duckdb elt etl llm orchestration polars pyspark python rust
Last synced: 23 Jun 2025
https://github.com/viraltux/sqldf.jl
SQL for Julia Tables |> DataFrame
dataframes julia query sql sqlite tables
Last synced: 15 Aug 2025
https://github.com/oneoffcoder/pyspark-formula
R-like formula approach to Spark Dataframes
classification clustering dataframes interaction-design patsy pyspark regression rlike-formulas spark
Last synced: 13 Apr 2025
https://github.com/cmungall/json-flattener
Python library for denormalizing nested dicts or json objects to tables and back
dataframes denormalization json linkml pandas yaml
Last synced: 20 Jul 2025
https://github.com/sirracha/geospatial_mapping_in_python
A walkthrough of tutorials I made for working with geospatial data in Python. Includes my evaluations of Python geospatial libraries, tools and packages.
dataframes folium-maps geopandas geospatial geospatial-analysis geospatial-data geospatial-intelligence geospatial-visualization mapbox matplotlib newyork pizza plotly route-optimization subway tsp-problem tsp-solver
Last synced: 24 Apr 2025
https://github.com/lfenzo/impostor.jl
The highly versatile synthetic data generator
data dataframes datasets generator julia synthetic synthetic-data
Last synced: 17 Jul 2025
https://github.com/vlad-bystrov/spark-user-feedback
conversion dataframes datasets rdd spark
Last synced: 30 Apr 2025
https://github.com/justinhchae/pd-helper
A helpful package to streamline Pandas DataFrame optimization.
bigdata dataframes developer-tools optimization-tools pandas python3
Last synced: 23 Mar 2025
https://github.com/sbl-sdsc/df-parallel
Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.
cuda-toolkit dask dask-cudf dask-dataframes dataframes gpu-computing parallel-processing pyspark-dataframes rapidsai
Last synced: 12 Apr 2025
https://github.com/gyuho/dataframe
Package dataframe implements data frame.
Last synced: 03 Aug 2025
https://github.com/dlr-eoc/ukis-h3cellstore
High-level Rust and Python libraries to store H3 cells in ClickHouse databases
Last synced: 23 Jul 2025
https://github.com/pdiegel/floridapropertydata
A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.
counties data-retrieval dataframes manatee parcel processing python python3 real-estate retrieval sarasota
Last synced: 19 Jul 2025
https://github.com/enso-org/dataframes
A library for working with tabular data in Luna.
dataframes hybrid luna textual visual visualisation
Last synced: 17 Aug 2025
https://github.com/mkearney/attrbl
A tidy approach to attributes
attr attributes dataframes mkearney-r-package r rstats tibble tidy tidy-data tidyverse
Last synced: 02 Dec 2025
https://github.com/mpastell/xlsxreader.jl
Julia package for reading Excel xlsx to a DataFrame
Last synced: 10 Apr 2025
https://github.com/emmaccode/superframes.jl
An object-oriented DataFrames.jl alternative
Last synced: 22 Apr 2025
https://github.com/pyladiesams/pyspark-nov2019
An introduction to PySpark
dataframes groupby pyspark python schema workshop
Last synced: 07 May 2025
https://github.com/williambdean/frame-search
A GitHub search inspired interface to DataFrames
data-science dataframes github
Last synced: 04 Sep 2025
https://github.com/imprv-ai/date-a-scientist
Query dataframes, find issue with your notebook snippets as if a professional data scientist was pair coding with you. Currently just a thin wrapper around an amazing library called pandas-ai by sinaptik-ai!
data-science dataframes notebook-jupyter pandas pandas-ai
Last synced: 06 Oct 2025
https://github.com/raghavtwenty/r-programming
📊 R Programming Fundamentals: An In-Depth Guide with Real World Examples & Comments
analysis beginner-friendly classification coding dataframes distributions fundamentals graphs learning-resources matrices plotting programming-languages r r-programming raghavtwenty rbasics regression rlanguage statistics tutorials
Last synced: 07 Apr 2025
https://github.com/cytomining/cytodataframe
An in-memory data analysis format for single-cell profiles alongside their corresponding images and segmentation masks.
dataframes image-analysis image-based-profiling single-cell
Last synced: 14 Jun 2025
https://github.com/scicloj/tablemath
Math and statistics modelling with table ergonomics
clojure dataframes math statistics
Last synced: 12 Jul 2025
https://github.com/rudrakshi99/movie-recommendation-system
Movie recommendation system using machine learning and predict user ratings for the movies.
dataframes machine-learning matplotlib numpy pandas seaborn
Last synced: 19 Jun 2025
https://github.com/jbris/excel_dna_test
Testing the installation and use of the Excel-DNA package
c-sharp c-sharp-library dataframe dataframe-library dataframes dotnet excel excel-dna spreadsheet spreadsheets visual-studio visual-studio-code
Last synced: 02 Mar 2025
https://github.com/csfelix/julia-basics
🍡 Basic Concepts in Julia Language 🍡
conditions constants dataframes datasets functions julia lambdas loops objects vars
Last synced: 03 Apr 2025
https://github.com/arbaznazir/datalineagepy
86% faster data lineage tracking for pandas DataFrames with zero infrastructure. Real-time monitoring, ML anomaly detection, and enterprise compliance features.
anomaly-detection data-eng data-governance data-lineage data-quality data-science dataframes enterprise etl lineage-tracing machine-learning pandas python
Last synced: 21 Jun 2025
https://github.com/sosiristseng/jl-dataframes
DataFrame examples by Bogumił Kamiński, rendered in Jupyter-book
dataframes julia jupyter-book jupyter-notebook tutorial
Last synced: 22 Mar 2025
https://github.com/camilajaviera91/apache-beam-pipeline-first-approach
This code demonstrates how to integrate Apache Beam with scikit-learn datasets and perform simple data transformations. It loads the Linnerud dataset from scikit-learn, converts it into a Pandas DataFrame for easier manipulation.
apache-beam dataframes glob kmeans-clustering matplotlib-pyplot mean-absolute-error mean-square-error numpy os pandas pipelines scipy-stats seaborn silhouette-score sklearn sklearn-datasets standardscaler
Last synced: 04 Oct 2025
https://github.com/neha-dev-dot/pyspark-tutorial
This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.
actions data-partitioning dataframes pyspark-basics pyspark-sql rdds sparkbasics sparkcontext sparksession transformations udfs window-functions
Last synced: 12 Jul 2025
https://github.com/lilivalgo/coal-production-colombia
Data analysis that includes information on annual coal production, royalties generated, and climate variables. Descriptive analysis and visual analysis techniques were used
analysis data-visualization dataframes insights manipulation matplotlib python seaborn transformation
Last synced: 28 Mar 2025
https://github.com/nelsonbittencourt/excel_to_dataframe
A high performance C++ library to convert Excel files to pandas dataframes.
converter cplusplus cpp dataframes excel pandas parser performance python worksheets
Last synced: 10 Apr 2025
https://github.com/moindalvs/learn_about_python_dataframes
Learn about Pandas Dataframe
clipboard-copy dataframe dataframes dropna duplicates duplicates-removal fillna gif import-csv ipython-display merge-dataframe missing-data pandas-dataframe pandas-dataframes pandas-python summary-statistics tocsv youtube-video
Last synced: 11 Mar 2025
https://github.com/celineboutinon/bottleneck
ENSAE-ENSAI Formation Continue (Cepe)/OpenClassrooms Data Analyst 2022-2023 - Projet 5
data-analysis data-analytics data-visualisation dataframes market-intelligence marketing-analytics matplotlib-pyplot missingno numpy pandas python seaborn
Last synced: 07 Sep 2025
https://github.com/rahulvictor12/the-movie-database-data-scrapper
A Python web scraper that collects movie data from The Movie Database (TMDB). It uses `requests`, `BeautifulSoup`, and `pandas` to extract titles, ratings, genres, and cast details from multiple pages. The data is structured into DataFrames and saved as a CSV, perfect for analysis or integration into projects.
beautifulsoup colab-notebook dataframes numpy pandas python requests testing webscraping
Last synced: 05 Oct 2025
https://github.com/amarlearning/who-is-drunk-and-when-in-ames-iowa
Jupyter Notebook on breath alcohol test data from Ames, Iowa, USA.
data-cleaning data-science dataframes importing-and-cleaning-data jupyter-notebook python
Last synced: 13 Oct 2025
https://github.com/venkat-a/exploratory-data-analysis-eda-using-pyspark
Leverage the power of Apache Spark for large-scale data processing and analysis
dataframes descriptive-statistics hadoop-hdfs matplotlib plotly-express pyspark-python seaborn sql statistical-analysis visualization
Last synced: 25 Feb 2025
https://github.com/zsomborjoel/pyspark-basics
Teaching and learning the functionality of the Spark Python API on dataframes
Last synced: 11 Sep 2025
https://github.com/dmarks84/coursework_project_nlp-with-nltk
Project for University of Michigan Applied Data Science Specialization -- Utilized NLTK library to process natural language, and then built several spelling recommenders for a list of misspelled words.
data-modeling databases dataframes eda nlp numpy pandas python reporting statistics text-mining visualization
Last synced: 28 Oct 2025
https://github.com/hadarsharon/grizzlys
User-friendly Python DataFrames 🔵🟡 powered by Julia 🔴🟢🟣
big-data data data-analysis data-engineering data-frame data-frames data-science dataframe dataframe-library dataframes dataframes-jl julia python
Last synced: 15 Jul 2025
https://github.com/dmarks84/coursework_project_ml-classification
Project for IBM Data Science course on Machine Learning -- Trained ML models for classification, evaluating based on a variety of metrics
classification communication data-modeling dataframes numpy pandas python scikit-learn supervised-ml
Last synced: 02 Sep 2025
https://github.com/hadarsharon/compars
DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻❄️ DataFrame comparison library)
data-engineering data-profiling data-quality dataframe dataframes koalas pandas polars pyspark python rust spark
Last synced: 16 Mar 2025
https://github.com/aidan-zamfir/the-iliad
Data analysis & relationship network for the characters of Homers Iliad
data data-analysis dataframes networks networkx python selenium spacy webscraping
Last synced: 02 Mar 2025
https://github.com/celineboutinon/chicken-run
OpenClassrooms Data Analyst 2022-2023 - Projet 9
data-analysis data-analytics data-visualisation dataframes matplotlib-pyplot missingno numpy pandas plotly python scikit-learn scipy seaborn statsmodels
Last synced: 12 Apr 2025
https://github.com/celineboutinon/bookworms
OpenClassrooms Data Analyst 2022-2023 - Projet 6
apriori-algorithm data-analysis data-analytics data-visualisation dataframes matplotlib-pyplot mlxtend numpy pandas python scikit-learn scikit-posthocs scikitlearn seaborn statsmodels
Last synced: 12 Apr 2025
https://github.com/vyjayanthipolapragada/marketing_statistical_analysis
Statistical analysis of customer data and their impact on the sales of products based on marketing campaigns
customer-data data-analysis dataframes marketing matplotlib numpy pandas python seaborn statistical-analysis
Last synced: 10 Jul 2025
https://github.com/jbalooshie/school_district_analysis
Analysis of standardized testing results using NumPy and Pandas, executed in Jupyter Notebook. Summaries of the testing results are provided based on school, test type, and grade level.
data-analysis data-science dataframes jupyter-notebook numpy pandas python
Last synced: 09 Apr 2025
https://github.com/dmarks84/ind_project_data-science-london-scikit-learn--kaggle
Independent Project - Kaggle Competition -- I worked on the Data Science London data set for the Data Science London + Scikit-learn competition.
classification cross-validation data-modeling data-reporting data-visualization dataframes eda grid-search matplotlib numpy pandas python sklearn statistics supervised-ml
Last synced: 09 Apr 2025
https://github.com/dmarks84/ind_project_readme-generator
Independent (personal) project in which I automatically generate README files for each of my repositories from my coursework
dataframes etl numpy pandas programming python
Last synced: 09 Apr 2025
https://github.com/kgelli/pyspark-fundamentals
A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.
big-data data-transformation dataframes pyspark python spark-sql
Last synced: 05 Apr 2025
https://github.com/dmarks84/coursework_project_boston-data-project
Project for IBM Data Science course on Statistics -- Read in a large data set and performed several statistical analyses and hypothesis testing
communication data-modeling data-reporting dataframes eda hypothesis-testing matplotlib numpy pandas probability python scipy seaborn statistics visualization
Last synced: 09 Apr 2025
https://github.com/hildahnagawa/data-frames-with-pandas
This is a series of projects to learn DataFrames with Pandas
dataframes jupyter-notebook pandas-python
Last synced: 28 Nov 2025