An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with datascience

A curated list of projects in awesome lists tagged with datascience .

https://github.com/faviovazquez/ds-cheatsheets

List of Data Science Cheatsheets to rule the world

cheatsheet datascience jupyter programming python r spark

Last synced: 18 Oct 2025

https://github.com/FavioVazquez/ds-cheatsheets

List of Data Science Cheatsheets to rule the world

cheatsheet datascience jupyter programming python r spark

Last synced: 26 Mar 2025

https://github.com/modin-project/modin

Modin: Scale your Pandas workflows by changing a single line of code

analytics data-science dataframe datascience distributed modin pandas python sql

Last synced: 11 May 2025

https://github.com/firmai/industry-machine-learning

A curated list of applied machine learning and data science notebooks and libraries across different industries (by @firmai)

data-science datascience example firmai jupyter-notebook machine-learning practical-machine-learning python

Last synced: 14 May 2025

https://github.com/holoviz/panel

Panel: The powerful data exploration & web app framework for Python

bokeh control-panels dashboards dataapp datascience dataviz gui holoviews holoviz hvplot jupyter matplotlib panel plotly

Last synced: 13 May 2025

https://github.com/nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 12 Apr 2025

https://github.com/Nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 05 Apr 2025

https://github.com/lk-geimfari/mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

data dataframe datascience dummy factory factory-boy fake fixtures generator json-generator mimesis mock pandas polars pytest-plugin python schema syntetic synthetic-data testing

Last synced: 28 Dec 2025

https://github.com/whoiskatrin/sql-translator

SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

data-analysis data-engineering dataquery datascience dataset openai postgresql query sql

Last synced: 14 May 2025

https://github.com/theoehrly/fast-f1

FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

datascience formula1 motorsport

Last synced: 12 May 2025

https://github.com/theOehrly/Fast-F1

FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

datascience formula1 motorsport

Last synced: 14 Mar 2025

https://github.com/entilzha/pyfunctional

Python library for creating data pipelines with chain functional programming

data datascience functional-programming pipeline python

Last synced: 14 May 2025

https://github.com/EntilZha/PyFunctional

Python library for creating data pipelines with chain functional programming

data datascience functional-programming pipeline python

Last synced: 26 Mar 2025

https://github.com/ujjwalkarn/datasciencer

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 15 May 2025

https://github.com/ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 10 May 2025

https://github.com/chris1610/pbpython

Code, Notebooks and Examples from Practical Business Python

data-analysis data-visualization datascience pandas python scikit-learn

Last synced: 15 May 2025

https://github.com/alan-turing-institute/clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 13 May 2025

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 26 Mar 2025

https://github.com/firmai/business-machine-learning

A curated list of practical business machine learning (BML) and business data science (BDS) applications for Accounting, Customer, Employee, Legal, Management and Operations (by @firmai)

business-machine-learning datascience example jupyter jupyter-notebook machine-learning practical-machine-learning

Last synced: 06 May 2025

https://github.com/wx-chevalier/ai-notes

:books: [.md & .ipynb] Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. ๐Ÿ’ซ ไบบๅทฅๆ™บ่ƒฝไธŽๆทฑๅบฆๅญฆไน ๅฎžๆˆ˜๏ผŒๆ•ฐ็†็ปŸ่ฎก็ฏ‡ | ๆœบๅ™จๅญฆไน ็ฏ‡ | ๆทฑๅบฆๅญฆไน ็ฏ‡ | ่‡ช็„ถ่ฏญ่จ€ๅค„็†็ฏ‡ | ๅทฅๅ…ทๅฎž่ทต Scikit & Tensoflow & PyTorch ็ฏ‡ | ่กŒไธšๅบ”็”จ & ่ฏพ็จ‹็ฌ”่ฎฐ

artificial-intelligence datascience deeplearning machinelearning natural-language-processing neural-network wx-doc

Last synced: 17 Jun 2025

https://github.com/wx-chevalier/AI-Notes

:books: [.md & .ipynb] Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. ๐Ÿ’ซ ไบบๅทฅๆ™บ่ƒฝไธŽๆทฑๅบฆๅญฆไน ๅฎžๆˆ˜๏ผŒๆ•ฐ็†็ปŸ่ฎก็ฏ‡ | ๆœบๅ™จๅญฆไน ็ฏ‡ | ๆทฑๅบฆๅญฆไน ็ฏ‡ | ่‡ช็„ถ่ฏญ่จ€ๅค„็†็ฏ‡ | ๅทฅๅ…ทๅฎž่ทต Scikit & Tensoflow & PyTorch ็ฏ‡ | ่กŒไธšๅบ”็”จ & ่ฏพ็จ‹็ฌ”่ฎฐ

artificial-intelligence datascience deeplearning machinelearning natural-language-processing neural-network wx-doc

Last synced: 09 May 2025

https://github.com/vegas-viz/Vegas

The missing MatPlotLib for Scala + Spark

datascience plotting scala

Last synced: 07 May 2025

https://github.com/techascent/tech.ml.dataset

A Clojure high performance data processing system

clojure csv dataframe datascience dataset etl-pipeline java machine-learning xlsx

Last synced: 15 May 2025

https://github.com/kkulma/climate-change-data

:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change

climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats

Last synced: 04 Apr 2025

https://github.com/turicas/socios-brasil

Captura os dados de sรณcios das empresas brasileiras na Receita Federal e exporta para um formato legรญvel por humanos

brazil data-driven-journalism datascience economic-data empresas hacktoberfest opendata python socios

Last synced: 15 May 2025

https://github.com/ShopRunner/jupyter-notify

A Jupyter Notebook magic for browser notifications of cell completion

datascience

Last synced: 12 Apr 2025

https://github.com/shoprunner/jupyter-notify

A Jupyter Notebook magic for browser notifications of cell completion

datascience

Last synced: 16 May 2025

https://github.com/juliaearth/geostats.jl

An extensible framework for geospatial data science and geostatistical modeling fully written in Julia

datascience geo geospatial geostatistics gis spatial-statistics statistical-learning statistics

Last synced: 31 Jan 2026

https://github.com/holgerbrandl/krangl

krangl is a {K}otlin DSL for data w{rangl}ing

data-mining datascience java kotlin sql

Last synced: 11 Apr 2025

https://github.com/JuliaEarth/GeoStats.jl

An extensible framework for geospatial data science and geostatistical modeling fully written in Julia

datascience geo geospatial geostatistics gis spatial-statistics statistical-learning statistics

Last synced: 14 Mar 2025

https://github.com/Gmousse/dataframe-js

A javascript library providing a new data structure for datascientists and developpers

data data-frame dataframe datascience datastructures functional groupby javascript manipulation matrix sql sql-syntax

Last synced: 15 Mar 2025

https://github.com/pirate/wikipedia-mirror

๐ŸŒ Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

archiving datascience docker docker-compose html internet-archiving kiwix kiwix-offline-wikipedia mediawiki mwdumper nginx openzim wiki wikipedia wikipedia-dump wikipedia-mirror xowa zim

Last synced: 16 May 2025

https://github.com/Mohitkr95/Best-Data-Science-Resources

This repository contains the best Data Science free hand-picked resources to equip you with all the industry-driven skills and interview preparation kit.

ai artificial-intelligence artificial-intelligence-algorithms aws computer-vision data data-structures datascience deep-learning git github jupyter-notebook machine-learning mongodb natural-language-processing neural-network python sql statistics

Last synced: 07 May 2025

https://github.com/datavane/datavines

Know your data better๏ผDatavines is Next-gen Data Observability Platform, support metadata manage and data quality.

dataobservability dataprofile dataquality datascience doris metadata spark

Last synced: 09 Apr 2025

https://github.com/DataScienceUB/introduction-datascience-python-book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

analytics data data-science datascience machine-learning python sentiment-analysis

Last synced: 19 Jul 2025

https://github.com/gacwr/openuba

A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]

analytics anomaly-detection cybersecurity datascience elasticsearch elk flask information-security machine-learning nodejs react security siem sklearn spark tensorflow threathunting uba ueba user-behaviour

Last synced: 04 Apr 2025

https://github.com/Niketkumardheeryan/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

analytics data-analysis data-science data-visualization datascience deep-learning deep-neural-networks deployment flask heroku-deployment machine-learning python r statistics streamlit-webapp

Last synced: 05 May 2025

https://github.com/maif/melusine

๐Ÿ“ง Melusine: Use python to automatize your email processing workflow

courriels datascience emails natural-language-processing nlp nlp-machine-learning python python3

Last synced: 16 May 2025

https://github.com/MAIF/melusine

๐Ÿ“ง Melusine: Use python to automatize your email processing workflow

courriels datascience emails natural-language-processing nlp nlp-machine-learning python python3

Last synced: 02 Apr 2025

https://github.com/traceloop/openllmetry-js

Sister project to OpenLLMetry, but in Typescript. Open-source observability for your LLM application, based on OpenTelemetry

datascience generative-ai javascript llmops metrics ml model-monitoring monitoring nextjs observability open-source opentelemetry opentelemetry-javascript typescript

Last synced: 14 May 2025

https://github.com/openml/openml-python

OpenML's Python API for a World of Data and More ๐Ÿ’ซ

benchmarking data datascience machine-learning meta-learning openml python tabular-data

Last synced: 10 Apr 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignments

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 08 Apr 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignment-collection

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 28 Feb 2025

https://github.com/turicas/salarios-magistrados

Baixa as planilhas de salรกrios de magistrados, extrai os contracheques, limpa e exporta pra CSV

brazil data-driven-journalism datascience justice opendata python

Last synced: 30 Apr 2025

https://github.com/MLWhiz/data_science_blogs

A repository to keep track of all the code that I end up writing for my blog posts.

blogging chatbot data datascience gan graphs machine-learning mcmc python spark streamlit time-series xgboost

Last synced: 05 May 2025

https://github.com/mlwhiz/data_science_blogs

A repository to keep track of all the code that I end up writing for my blog posts.

blogging chatbot data datascience gan graphs machine-learning mcmc python spark streamlit time-series xgboost

Last synced: 06 Apr 2025

https://github.com/anaconda/anaconda-project

Tool for encapsulating, running, and reproducing data science projects

anaconda conda-environment data datascience encapsulation reproducibility running

Last synced: 11 Dec 2025

https://github.com/IngestAI/embedditor

โšก GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: 28 Mar 2025

https://github.com/brianruizy/covid19-dashboard

๐Ÿฆ  Django + Plotly Coronavirus dashboard. Powerful data driven Python web-app, with an awesome UI. Contributions welcomed! Featured on ๐Ÿ•ถAwesome-list

coronavirus coronavirus-real-time coronavirus-tracker covid-19 covid-dashboard covid-data dashboard data-visualization datascience django django-application django-web-app heroku pandemic plot plotly python

Last synced: 01 Oct 2025

https://github.com/SkillCorner/opendata

SkillCorner Open Data with 9 matches of broadcast tracking data.

datascience soccer sportsanalytics

Last synced: 27 Apr 2025

https://github.com/holgerbrandl/kravis

A {K}otlin g{ra}mmar for data {vis}ualization

data-visualization datascience ggplot2 kotlin krangl

Last synced: 04 Apr 2025

https://khuyentran1401.github.io/machine-learning-articles/

List of interesting articles on different topics of machine learning and deep learning

ai articles-summaries artificial-intelligence awesome-list datascience deep-learning machinelearning neural-network

Last synced: 06 May 2025

https://github.com/aaaastark/data-scientist-books

Data-Scientist-Books (Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Long Short Term Memory, Generative Adversarial Network, Time Series Forecasting, Probability and Statistics, and more.)

ai artificial-intelligence datascience deeplearning dl ds gans lstm machinelearning ml probability python r statistics tsf

Last synced: 03 Apr 2025

https://github.com/traceloop/hub

High-scale LLM gateway, written in Rust. OpenTelemetry-based observability included

artificial-intelligence datascience generative-ai llm llmops ml model-monitoring observability open-source opentelemetry rust

Last synced: 08 Feb 2026

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 06 Apr 2025

https://github.com/juliaai/datasciencetutorials.jl

A set of tutorials to show how to use Julia for data science (DataFrames, MLJ, ...)

datascience julia-language mlj tutorials

Last synced: 12 Apr 2025