An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/empathy87/the-elements-of-statistical-learning-python-notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 13 Apr 2025

https://github.com/Kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 11 Apr 2025

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 01 May 2025

https://github.com/googlecloudplatform/dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 03 Oct 2025

https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 01 May 2025

https://github.com/androz2091/discord-data-package-explorer

🌀 What's really in your Discord Data package?

data-analysis discord discord-data-package statistics

Last synced: 16 May 2025

https://github.com/yoshoku/rumale

Rumale is a machine learning library in Ruby

artificial-intelligence data-analysis data-science machine-learning ml ruby rubyml

Last synced: 29 Apr 2025

https://github.com/Androz2091/discord-data-package-explorer

🌀 What's really in your Discord Data package?

data-analysis discord discord-data-package statistics

Last synced: 31 Mar 2025

https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 04 Oct 2025

https://github.com/mrankitgupta/Data-Analyst-Roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 07 Sep 2025

https://github.com/xiaopujun/light-chaser

light chaser is a lightweight data visualization designer tool

blueprints data-analysis data-visualization draggable javascript typescript web-editor

Last synced: 16 May 2025

https://github.com/mrankitgupta/data-analyst-roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 13 Apr 2025

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 24 Mar 2025

https://github.com/ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization

Last synced: 12 Apr 2025

https://github.com/arvkevi/kneed

Knee point detection in Python :chart_with_upwards_trend:

data-analysis data-science elbow-method knee-point python scientific-computing systems

Last synced: 21 Oct 2025

https://github.com/ashishpatel26/amazing-feature-engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 16 May 2025

https://github.com/program-spiritual/dataanalysisinaction

(Finished) Geek Time Data Analysis Practical 45 Lecture - Detailed notes containing markdown images mind map code data can be read directly code test

data-analysis data-analytics in-action notebook-jupyter pipenv pyenv python python-data-analysis python-data-science python3

Last synced: 23 Sep 2025

https://github.com/dmpe/R

Exercises (incl. analyses) with R language (math+statistics)

course data-analysis exercise r statistics

Last synced: 13 Jul 2025

https://github.com/nicolaskruchten/jupyter_pivottablejs

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 15 May 2025

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 10 Apr 2025

https://github.com/dmpe/r

Exercises (incl. analyses) with R language (math+statistics)

course data-analysis exercise r statistics

Last synced: 12 Apr 2025

https://github.com/mpw0311/antd-umi-sys

企业BI系统,数据可视化平台,主要技术:react、antd、umi、dva、es6、less等,与君共勉,互相学习,如果喜欢请start ⭐。

antd antd-umi-sys company-site d3js data-analysis data-visualization dva dvajs echarts echarts-for-react es6 gitdatav react react-redux react-router redux sankey umi umijs

Last synced: 04 Apr 2025

https://github.com/abixen/abixen-platform

Abixen Platform is a microservices based software platform for building enterprise applications delivering functionalities through creating particular microservices and integrating by provided CMS.

analytics angularjs architecture aws business-intelligence businessintelligence charts cloud dashboard data-analysis data-analytics data-visualization low-code microservices netflixoss reporting spring-boot spring-cloud sql-editor visualization

Last synced: 23 Aug 2025

https://github.com/scitools/iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data

data-analysis earth-science grib iris meteorology netcdf oceanography python spaceweather visualisation

Last synced: 02 Apr 2026

https://github.com/anthonydb/practical-sql

Code and Data for the First Edition of "Practical SQL" by Anthony DeBarros, published by No Starch Press (2018).

data-analysis postgresql sql

Last synced: 12 Apr 2025

https://github.com/elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

big-data data-analysis dataframe dataframes eland elasticsearch etl lightgbm machine-learning pandas python scikit-learn time-series-forecasting

Last synced: 14 Apr 2025

https://github.com/aloctavodia/bap

Bayesian Analysis with Python (Second Edition)

arviz bayesian-analysis data-analysis data-visualization errata pymc3 python

Last synced: 12 Apr 2025

https://github.com/dmnfarrell/pandastable

Table analysis in Tkinter using pandas DataFrames.

data-analysis dataframe pandas plotting scientific tkinter

Last synced: 14 May 2025

https://github.com/SciTools/iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data

data-analysis earth-science grib iris meteorology netcdf oceanography python spaceweather visualisation

Last synced: 07 Apr 2025

https://github.com/planetlabs/notebooks

interactive notebooks from Planet Engineering

api data-analysis jupyter-notebooks python remote-sensing satellite-imagery

Last synced: 29 Dec 2025

https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024

A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.

aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo

Last synced: 04 Apr 2025

https://github.com/jacksonwuxs/dapy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 05 Apr 2025

https://github.com/JacksonWuxs/DaPy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 28 Mar 2025

https://github.com/pgalko/bambooai

A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.

ai ai-agents anthropic data-analysis data-science docker gemini groq llm mistral ollama openai-api pandas pinecone python vector-database vllm

Last synced: 15 May 2025

https://github.com/binpash/pash

PaSh: Light-touch Data-Parallel Shell Processing

bash bash-scripting data-analysis parallelism pash posix-sh shell

Last synced: 22 Jan 2026

https://github.com/specterops/nemesis

An offensive data enrichment pipeline

data-analysis offensive

Last synced: 27 Jan 2026

https://github.com/LearnDataSci/articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

data-analysis data-science data-visualization machine-learning machine-learning-algorithms machinelearning python

Last synced: 13 Apr 2025

https://github.com/anthonydb/practical-sql-2

Code and Data for the Second Edition of "Practical SQL" by Anthony DeBarros, published by No Starch Press.

data-analysis postgresql sql

Last synced: 15 May 2025

https://github.com/dfm/corner.py

Make some beautiful corner plots

data-analysis data-visualization plotting python

Last synced: 13 Dec 2025

https://github.com/canner/wren-engine

🤖 The Semantic Engine for Model Context Protocol(MCP) Clients and AI Agents 🔥

agent agentic-ai ai business-intelligence data data-analysis data-analytics data-lake data-warehouse hacktoberfest llm mcp mcp-server semantic semantic-layer sql

Last synced: 10 Mar 2026

https://github.com/ni1o1/transbigdata

A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.

bike-sharing-data bus-gps-data data-analysis data-pre-processing data-quality-analysis data-visualization geospatial-data python spatio-temporal-data taxi-gps-data transportation

Last synced: 20 Feb 2026

https://github.com/akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python

Last synced: 01 Feb 2026

https://github.com/AgentTeam-TaichuAI/ScienceClaw

ScienceClaw is a personal research assistant built with LangChain DeepAgents and AIO Sandbox infrastructure, adopting a completely new architecture beyond OpenClaw. It offers stronger security, better transparency, and a more user-friendly experience.

ai-agent bioinformatics data-analysis docker drug-discovery fastapi feishu langchain langgraph llm mcp multi-agent pubmed research-assistant sandbox scientific-research self-hosted tool-use vue wechat

Last synced: 05 May 2026

https://github.com/visit-dav/visit

VisIt - Visualization and Data Analysis for Mesh-based Scientific Data

data-analysis data-viz hpc python radiuss scientific-computing scientific-visualization visualization

Last synced: 14 Jan 2026

https://github.com/s-shemmee/SQL-101

Get started with SQL database programming. This beginner's guide provides step-by-step tutorials, practical examples, exercises, and resources to master SQL. Let's unlock the power of data with SQL!

data-analysis data-science sql sql-challenges sql-commands sql-database sql-injection sql-server

Last synced: 27 Aug 2025

https://github.com/s-shemmee/sql-101

Get started with SQL database programming. This beginner's guide provides step-by-step tutorials, practical examples, exercises, and resources to master SQL. Let's unlock the power of data with SQL!

data-analysis data-science sql sql-challenges sql-commands sql-database sql-injection sql-server

Last synced: 05 Apr 2025

https://github.com/shaohua0116/ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

conference crawler data-analysis iclr iclr2020 machine-learning visualization

Last synced: 19 Jul 2025

https://github.com/openbiox/awosome-bioinformatics

A curated list of resources for learning bioinformatics.

bioinformatics data-analysis next-generation-sequencing

Last synced: 08 Jan 2026

https://github.com/pgalko/BambooAI

A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.

ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database

Last synced: 23 Mar 2025

https://github.com/supercowpowers/zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

bro data-analysis kafka networking pandas python scikit-learn security spark zeek zeek-analysis

Last synced: 09 Apr 2025

https://github.com/ptyadana/data-science-and-machine-learning-projects-dojo

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

boosting-algorithms data-analysis data-science data-visualization deep-learning keras machine-learning machine-learning-algorithms natural-language-processing pandas probability-statistics scikit-learn seaborn tensorflow

Last synced: 05 Apr 2025

https://github.com/SuperCowPowers/zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

bro data-analysis kafka networking pandas python scikit-learn security spark zeek zeek-analysis

Last synced: 19 Jul 2025

https://github.com/Niketkumardheeryan/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

analytics data-analysis data-science data-visualization datascience deep-learning deep-neural-networks deployment flask heroku-deployment machine-learning python r statistics streamlit-webapp

Last synced: 05 May 2025

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 05 May 2025

https://github.com/mouseland/suite2p

cell detection in calcium imaging recordings

data-analysis imaging neuroscience

Last synced: 06 Oct 2025

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 28 Oct 2025

https://github.com/weijie-chen/econometrics-with-python

Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

data-analysis data-science econometrics economics python statistics time-series

Last synced: 05 Apr 2025

https://github.com/discoveryjs/discovery

A framework for ad hoc JSON data analysis, shareable server-less reports and dashboards

data-analysis data-visualization json

Last synced: 15 May 2025

https://github.com/greppo-io/greppo

Build & deploy geospatial applications quick and easy.

data-analysis data-visualization developer-tools framework geospatial machine-learning python webapp

Last synced: 06 Apr 2026

https://github.com/jkrumbiegel/Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 14 May 2025

https://github.com/jkrumbiegel/chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 12 Apr 2025

https://github.com/MouseLand/suite2p

cell detection in calcium imaging recordings

data-analysis imaging neuroscience

Last synced: 07 May 2025

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Apr 2025

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance TypeScript library for data analysis. Works with tabular, geospatial and vector data.

ai analysis bun data data-analysis data-science deno duckdb geospatial javascript llm machine-learning node node-js nodejs spatial spatial-analysis sql typescript

Last synced: 22 May 2026

https://github.com/CICIFLY/Data-Analytics-Projects

This repository contains the projects related to data collecting, assessing,cleaning,visualizations and analyzing

data-analysis data-visualization jupyter-notebook matplotlib numpy pandas seaborn

Last synced: 02 May 2025

https://github.com/d4software/QueryTree

Data reporting and visualization for your app

analytics data-analysis data-visualization database report-builder

Last synced: 20 Jul 2025