An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with dataframes

A curated list of projects in awesome lists tagged with dataframes .

https://github.com/pola-rs/polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

arrow dataframe dataframe-library dataframes out-of-core polars python rust

Last synced: 15 Dec 2025

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 15 May 2025

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 14 May 2025

https://github.com/graphframes/graphframes

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

apache-spark big-data connected-components dataframe dataframes graphs network-motif network-motifs networks spark

Last synced: 14 May 2025

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 04 Aug 2025

https://github.com/elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

big-data data-analysis dataframe dataframes eland elasticsearch etl lightgbm machine-learning pandas python scikit-learn time-series-forecasting

Last synced: 14 Apr 2025

https://github.com/capitalone/datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark

Last synced: 14 May 2025

https://github.com/static-frame/static-frame

Immutable and statically-typeable DataFrames with runtime type and data validation

arrays dataframes immutable-collections immutable-data-structures python

Last synced: 14 Apr 2025

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 15 May 2025

https://github.com/rtosholdings/riptable

64bit multithreaded python data analytics tools for numpy arrays and datasets

analytics dataframes numpy

Last synced: 14 Dec 2025

https://github.com/RumbleDB/rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

azure csv data-science dataframes delta-lake hdfs json jsoniq lakehouse machine-learning nested parquet query query-engine s3 scale schemaless spark svm text

Last synced: 20 Nov 2025

https://github.com/rumbledb/rumble

⛈️ RumbleDB 1.23.0 "Mountain Ash" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

avro azure csv data-science dataframes hdfs json jsoniq machine-learning nested parquet query query-engine s3 scale schemaless spark svm text yaml

Last synced: 03 Aug 2025

https://github.com/alteryx/woodwork

Woodwork is a Python library that provides robust methods for managing and communicating data typing information.

data-science dataframe dataframes evalml featuretools inference machine-learning nlp-primitives python semantic-tags typing woodwork

Last synced: 15 May 2025

https://github.com/JuliaAcademy/DataFrames

Welcome to DataFrames.jl with Bogumił Kamiński

dataframes julia julia-language julialang learn-julia learn-to-code

Last synced: 16 May 2025

https://github.com/juliaacademy/dataframes

Welcome to DataFrames.jl with Bogumił Kamiński

dataframes julia julia-language julialang learn-julia learn-to-code

Last synced: 21 Aug 2025

https://github.com/zbrookle/dataframe_sql

A Python package that parses SQL and interprets it as methods that act upon existing pandas (or other types of) DataFrames that have been declared and registered

data dataframes pandas python sql

Last synced: 20 Aug 2025

https://github.com/red-data-tools/red_amber

A dataframe library for Rubyists.

apache-arrow dataframe dataframe-library dataframes ruby

Last synced: 11 Oct 2025

https://github.com/biodatageeks/polars-bio

Blazing-Fast Bioinformatic Operations on Python DataFrames

arrow bioinformatics dataframes datafusion genomic-intervals genomic-ranges genomics pandas polars rust-lang

Last synced: 29 Jul 2025

https://github.com/zbrookle/sql_to_ibis

A Python package that parses sql and converts it to ibis expressions

data databases dataframes etl hacktoberfest ibis sql

Last synced: 14 Apr 2025

https://github.com/hablapps/sparkoptics

Optics for Spark DataFrames

dataframe dataframes optics scala spark spark-sql

Last synced: 30 Jun 2025

https://github.com/juliagraphs/graphdataframebridge.jl

Tools for interoperability between DataFrame objects and LightGraphs and MetaGraphs objects

dataframes graphs hacktoberfest julia juliagraphs

Last synced: 20 Sep 2025

https://github.com/hackersandslackers/pandas-sqlalchemy-tutorial

:panda_face: :computer: Load or insert data into a SQL database using Pandas DataFrames.

data-analysis data-science dataframes pandas pandas-sqlalchemy-tutorial python sql-database sqlalchemy tutorial

Last synced: 16 Apr 2025

https://github.com/zgbjgg/jun

JUN - python pandas, plotly, seaborn support & dataframes manipulation over erlang

dataframes elixir erlang numpy pandas plotly python scipy seaborn

Last synced: 11 Apr 2025

https://github.com/scottlepp/go-duck

A Golang DuckDB library that doesn't require CGO

analytics dataframe dataframes duckdb golang grafana sql

Last synced: 07 May 2025

https://github.com/mohammadreza-mohammadi94/data-analysis-and-machine-learning-projects

A comprehensive collection of data analysis and machine learning projects, showcasing techniques and models for various data challenges. Dive in to explore code examples, analyses, and machine learning workflows.

data-analysis data-science dataframes deep-learning exploratory-data-analysis hyperparameter-tuning machine-learning machine-learning-algorithms pandas python scikit-learn visualization

Last synced: 06 Oct 2025

https://github.com/surister/datasaurus

Data Engineering framework written in Python based in Polars.

classes data dataframes datamodeling framework library orm polars python

Last synced: 06 Oct 2025

https://github.com/lungben/tableio.jl

A glue package for reading and writing tabular data. It aims to provide a uniform api for reading and writing tabular data from and to multiple sources.

arrow csv data data-science database dataframe dataframes excel jdf json-format parquet postgresql sqlite zip

Last synced: 12 Oct 2025

https://github.com/typedef-ai/fenic

Build reliable AI and agentic applications with DataFrames

agents ai arrow dataframe-library dataframes duckdb elt etl llm orchestration polars pyspark python rust

Last synced: 23 Jun 2025

https://github.com/viraltux/sqldf.jl

SQL for Julia Tables |> DataFrame

dataframes julia query sql sqlite tables

Last synced: 15 Aug 2025

https://github.com/cmungall/json-flattener

Python library for denormalizing nested dicts or json objects to tables and back

dataframes denormalization json linkml pandas yaml

Last synced: 20 Jul 2025

https://github.com/sirracha/geospatial_mapping_in_python

A walkthrough of tutorials I made for working with geospatial data in Python. Includes my evaluations of Python geospatial libraries, tools and packages.

dataframes folium-maps geopandas geospatial geospatial-analysis geospatial-data geospatial-intelligence geospatial-visualization mapbox matplotlib newyork pizza plotly route-optimization subway tsp-problem tsp-solver

Last synced: 24 Apr 2025

https://github.com/dmyersturnbull/typed-dfs

Make Pandas DataFrames enforce definitions, self-organize, and correctly serialize in 18 formats.

csv dataframes excel feather hdf5 ini json pandas parquet required toml typed

Last synced: 21 Mar 2025

https://github.com/lfenzo/impostor.jl

The highly versatile synthetic data generator

data dataframes datasets generator julia synthetic synthetic-data

Last synced: 17 Jul 2025

https://github.com/justinhchae/pd-helper

A helpful package to streamline Pandas DataFrame optimization.

bigdata dataframes developer-tools optimization-tools pandas python3

Last synced: 23 Mar 2025

https://github.com/sbl-sdsc/df-parallel

Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.

cuda-toolkit dask dask-cudf dask-dataframes dataframes gpu-computing parallel-processing pyspark-dataframes rapidsai

Last synced: 12 Apr 2025

https://github.com/gyuho/dataframe

Package dataframe implements data frame.

data-frame dataframes go

Last synced: 03 Aug 2025

https://github.com/dlr-eoc/ukis-h3cellstore

High-level Rust and Python libraries to store H3 cells in ClickHouse databases

clickhouse dataframes h3 ukis

Last synced: 23 Jul 2025

https://github.com/pdiegel/floridapropertydata

A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.

counties data-retrieval dataframes manatee parcel processing python python3 real-estate retrieval sarasota

Last synced: 19 Jul 2025

https://github.com/enso-org/dataframes

A library for working with tabular data in Luna.

dataframes hybrid luna textual visual visualisation

Last synced: 17 Aug 2025

https://github.com/mpastell/xlsxreader.jl

Julia package for reading Excel xlsx to a DataFrame

dataframes excel julia xlsx

Last synced: 10 Apr 2025

https://github.com/emmaccode/superframes.jl

An object-oriented DataFrames.jl alternative

data dataframes julia

Last synced: 22 Apr 2025

https://github.com/williambdean/frame-search

A GitHub search inspired interface to DataFrames

data-science dataframes github

Last synced: 04 Sep 2025

https://github.com/imprv-ai/date-a-scientist

Query dataframes, find issue with your notebook snippets as if a professional data scientist was pair coding with you. Currently just a thin wrapper around an amazing library called pandas-ai by sinaptik-ai!

data-science dataframes notebook-jupyter pandas pandas-ai

Last synced: 06 Oct 2025

https://github.com/cytomining/cytodataframe

An in-memory data analysis format for single-cell profiles alongside their corresponding images and segmentation masks.

dataframes image-analysis image-based-profiling single-cell

Last synced: 14 Jun 2025

https://github.com/scicloj/tablemath

Math and statistics modelling with table ergonomics

clojure dataframes math statistics

Last synced: 12 Jul 2025

https://github.com/rudrakshi99/movie-recommendation-system

Movie recommendation system using machine learning and predict user ratings for the movies.

dataframes machine-learning matplotlib numpy pandas seaborn

Last synced: 19 Jun 2025

https://github.com/csfelix/julia-basics

🍡 Basic Concepts in Julia Language 🍡

conditions constants dataframes datasets functions julia lambdas loops objects vars

Last synced: 03 Apr 2025

https://github.com/arbaznazir/datalineagepy

86% faster data lineage tracking for pandas DataFrames with zero infrastructure. Real-time monitoring, ML anomaly detection, and enterprise compliance features.

anomaly-detection data-eng data-governance data-lineage data-quality data-science dataframes enterprise etl lineage-tracing machine-learning pandas python

Last synced: 21 Jun 2025

https://github.com/sosiristseng/jl-dataframes

DataFrame examples by Bogumił Kamiński, rendered in Jupyter-book

dataframes julia jupyter-book jupyter-notebook tutorial

Last synced: 22 Mar 2025

https://github.com/camilajaviera91/apache-beam-pipeline-first-approach

This code demonstrates how to integrate Apache Beam with scikit-learn datasets and perform simple data transformations. It loads the Linnerud dataset from scikit-learn, converts it into a Pandas DataFrame for easier manipulation.

apache-beam dataframes glob kmeans-clustering matplotlib-pyplot mean-absolute-error mean-square-error numpy os pandas pipelines scipy-stats seaborn silhouette-score sklearn sklearn-datasets standardscaler

Last synced: 04 Oct 2025

https://github.com/neha-dev-dot/pyspark-tutorial

This repository is part of my journey to learn **PySpark**, the Python API for Apache Spark. I explored the fundamentals of distributed data processing using Spark and practiced with real-world data transformation and querying use cases.

actions data-partitioning dataframes pyspark-basics pyspark-sql rdds sparkbasics sparkcontext sparksession transformations udfs window-functions

Last synced: 12 Jul 2025

https://github.com/lilivalgo/coal-production-colombia

Data analysis that includes information on annual coal production, royalties generated, and climate variables. Descriptive analysis and visual analysis techniques were used

analysis data-visualization dataframes insights manipulation matplotlib python seaborn transformation

Last synced: 28 Mar 2025

https://github.com/nelsonbittencourt/excel_to_dataframe

A high performance C++ library to convert Excel files to pandas dataframes.

converter cplusplus cpp dataframes excel pandas parser performance python worksheets

Last synced: 10 Apr 2025

https://github.com/rahulvictor12/the-movie-database-data-scrapper

A Python web scraper that collects movie data from The Movie Database (TMDB). It uses `requests`, `BeautifulSoup`, and `pandas` to extract titles, ratings, genres, and cast details from multiple pages. The data is structured into DataFrames and saved as a CSV, perfect for analysis or integration into projects.

beautifulsoup colab-notebook dataframes numpy pandas python requests testing webscraping

Last synced: 05 Oct 2025

https://github.com/valasatava/geneprot3d

Gene variation in 3D

dataframes

Last synced: 22 Jul 2025

https://github.com/zsomborjoel/pyspark-basics

Teaching and learning the functionality of the Spark Python API on dataframes

basics dataframes spark

Last synced: 11 Sep 2025

https://github.com/dmarks84/coursework_project_nlp-with-nltk

Project for University of Michigan Applied Data Science Specialization -- Utilized NLTK library to process natural language, and then built several spelling recommenders for a list of misspelled words.

data-modeling databases dataframes eda nlp numpy pandas python reporting statistics text-mining visualization

Last synced: 28 Oct 2025

https://github.com/dmarks84/coursework_project_ml-classification

Project for IBM Data Science course on Machine Learning -- Trained ML models for classification, evaluating based on a variety of metrics

classification communication data-modeling dataframes numpy pandas python scikit-learn supervised-ml

Last synced: 02 Sep 2025

https://github.com/hadarsharon/compars

DataFrame comparison done right, powered by Rust with polars (AKA the bear-agnostic 🐻 🐼 🐨 🐻‍❄️ DataFrame comparison library)

data-engineering data-profiling data-quality dataframe dataframes koalas pandas polars pyspark python rust spark

Last synced: 16 Mar 2025

https://github.com/raduldev/r-program

R language assignments and practice

dataframes rprogramming

Last synced: 24 Mar 2025

https://github.com/aidan-zamfir/the-iliad

Data analysis & relationship network for the characters of Homers Iliad

data data-analysis dataframes networks networkx python selenium spacy webscraping

Last synced: 02 Mar 2025

https://github.com/um-bgen632/week7labs

Repo for week 7 of BGEN632

dataframes jupyter python

Last synced: 08 Apr 2025

https://github.com/vyjayanthipolapragada/marketing_statistical_analysis

Statistical analysis of customer data and their impact on the sales of products based on marketing campaigns

customer-data data-analysis dataframes marketing matplotlib numpy pandas python seaborn statistical-analysis

Last synced: 10 Jul 2025

https://github.com/jbalooshie/school_district_analysis

Analysis of standardized testing results using NumPy and Pandas, executed in Jupyter Notebook. Summaries of the testing results are provided based on school, test type, and grade level.

data-analysis data-science dataframes jupyter-notebook numpy pandas python

Last synced: 09 Apr 2025

https://github.com/dmarks84/ind_project_data-science-london-scikit-learn--kaggle

Independent Project - Kaggle Competition -- I worked on the Data Science London data set for the Data Science London + Scikit-learn competition.

classification cross-validation data-modeling data-reporting data-visualization dataframes eda grid-search matplotlib numpy pandas python sklearn statistics supervised-ml

Last synced: 09 Apr 2025

https://github.com/dmarks84/ind_project_readme-generator

Independent (personal) project in which I automatically generate README files for each of my repositories from my coursework

dataframes etl numpy pandas programming python

Last synced: 09 Apr 2025

https://github.com/kgelli/pyspark-fundamentals

A comprehensive collection of PySpark fundamentals with practical examples using retail and Formula 1 datasets.

big-data data-transformation dataframes pyspark python spark-sql

Last synced: 05 Apr 2025

https://github.com/dmarks84/coursework_project_boston-data-project

Project for IBM Data Science course on Statistics -- Read in a large data set and performed several statistical analyses and hypothesis testing

communication data-modeling data-reporting dataframes eda hypothesis-testing matplotlib numpy pandas probability python scipy seaborn statistics visualization

Last synced: 09 Apr 2025

https://github.com/hildahnagawa/data-frames-with-pandas

This is a series of projects to learn DataFrames with Pandas

dataframes jupyter-notebook pandas-python

Last synced: 28 Nov 2025