An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/bastianolea/prensa_chile

Web scraping y análisis de texto sobre un corpus de texto de noticias de la prensa chilena

chile data-science datascience r textanalysis textmining

Last synced: 18 Jun 2025

https://github.com/gabrieldim/house-price-prediction-data-science

Data Analysis & Visualization - Predict the future price of houses

analysis data-science house-price-prediction prediction visualization

Last synced: 10 Jul 2025

https://github.com/time-series-machine-learning/tsml-eval

Evaluation tools for time series machine learning algorithms.

benchmarking data-science evaluation machine-learning python time-series

Last synced: 17 Aug 2025

https://github.com/staircase-dev/piso

Pandas Interval Set Operations: providing methods for set operations, analytics, lookups and joins on pandas' Interval, IntervalArray and IntervalIndex

data-analysis data-science data-structures interval interval-arithmetic interval-set pandas set set-operations set-theory

Last synced: 20 Aug 2025

https://github.com/bukson/nancorrmp

Parallel correlation calculation of big numpy arrays or pandas dataframes with NaNs and infs.

correlation correlation-matrices data-science machine-learning multiprocessing numpy pandas python

Last synced: 16 Aug 2025

https://github.com/codingforentrepreneurs/try-pandas

In this series, we're going to learn the fundamentals of the popular Python data science tool called Pandas.

data-analysis data-science deepnote jupyter nba-api nba-stats notebook pandas python python-pandas

Last synced: 18 Jan 2026

https://github.com/cataseven/statistics-graph-chart-card

A highly customizable, smooth, and advanced graph card. Shows historical sensor data with dynamic trend colors, statistics (min, max, avg), and more. A great alternative to the default history graph and sensor cards.

analysis analytics bar-chart chart data data-analysis data-science data-visualization graph graphics histogram historical-data history home-assistant statistical-analysis statistics

Last synced: 12 Apr 2026

https://github.com/intuit/metriks

Python package of commonly used metrics for evaluating information retrieval models.

data-science information-retrieval metrics python36

Last synced: 21 Sep 2025

https://github.com/soodoku/data-science

Lecture Slides for Introduction to Data Science

data-science statistical-learning

Last synced: 11 Jan 2026

https://github.com/nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 08 Jul 2025

https://github.com/blmoore/blackspot

Shiny app exploring Edinburgh traffic collision data

data-science r shiny

Last synced: 05 Jul 2025

https://github.com/sanjinkurelic/casebasedreasoning

Find missing values in data set using Euclid distance, normalization and calculating information value, weight of evidence

case-based-reasoning csv data-science influence information-value machine-learning numpy pandas python3 weight-of-evidence

Last synced: 20 Jun 2025

https://github.com/sam-92/telegram-energy-api

The CleanEnergyBot is a Telegram bot providing real-time electricity usage, CO2 forecasts, and energy-saving tips in Ireland, using data from EirGrid and GPT-3 analysis. It helps users make eco-friendly energy choices by comparing emissions data with EU standards.

data-science data-visualization digitaltwins energy-data iot iot-application llm openai smart-grids smart-home telegram-bot telegram-bot-api

Last synced: 17 Jun 2025

https://github.com/city-of-helsinki/mlops-template

Generic repository template for small scale MLOps

data-science datascience machine-learning machinelearning mlops python

Last synced: 13 May 2025

https://github.com/chalk-ai/examples

Curated examples and patterns for using Chalk. Use these to build your feature pipelines.

chalk data data-science ml ml-ops pipeline python

Last synced: 17 Jan 2026

https://github.com/edrewitz/wxdata

A Python package of end-to-end weather data clients & raw data clients with VPN/PROXY support, data processors that decode variable keys from GRIB format into a plain-language format & various tools for assisting Python automated workflows, querying meteorological datasets and filling gaps in meteorological data.

automation data data-clients data-engineering data-engineering-pipeline data-processing data-processing-pipelines data-science meteorology meteorology-library python weather-data

Last synced: 23 May 2026

https://github.com/fusedio/fused-mcp

Fused MCP Agents: Setting up MCP Servers for Data Scientists

data-science fused mcp python udf

Last synced: 10 Aug 2025

https://github.com/nuhmanpk/webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 21 Mar 2025

https://github.com/pgebert/bike-sharing-dataset

Analysis and model development for the Kaggle Bike Sharing Dataset.

bike-sharing-dataset bikesharing data-science jupyter kaggle python

Last synced: 09 Oct 2025

https://github.com/RConsortium/r-collaboration

Open Collaboration, Data Registry, and Use Cases Developed by the R Community

data-analysis-in-r data-analytics data-science r

Last synced: 20 Jul 2025

https://github.com/incubated-geek-cc/Text-To-Speech-App

A Fusion of OCR Technology (Tesseract.js) & Web Speech API. Standalone, portable and works offline.

data-science javascript machine-learning ocr ocr-recognition tesseract tesseract-ocr tesseract-ocr-api tesseractjs webapp

Last synced: 16 Apr 2025

https://github.com/bgroenks96/normalizing-flows

Implementations of normalizing flows using python and tensorflow

data-science machine-learning machine-learning-algorithms normalizing-flows

Last synced: 09 Mar 2026

https://github.com/medoidai/skrobot

skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.

artificial-intelligence data-science feature-engineering feature-selection hyperparameter-tuning machine-learning model-evaluation model-selection model-training model-tuning open-source predictive-modelling python scikit-learn

Last synced: 02 Aug 2025

https://github.com/thechymera/behaviopy

Behavioral data analysis and plotting in Python.

animal-behavior biomedical data-science foss multimodality plotting

Last synced: 19 Apr 2025

https://github.com/isala404/speculo

Realtime face detection and recognition using deep learning

data-science face-recognition faces footages opencv python3 reactjs speculo surveillance tensorflow typescript

Last synced: 31 Aug 2025

https://github.com/pbrdng/learningalgebraicvarieties.jl

Learning Algebraic Varieties from Samples

algebraic-geometry data-science

Last synced: 11 Nov 2025

https://github.com/tomasonjo/bitcoin-to-neo4jdash

Project that listens to bitcoin websocket API for new transactions and stores them to Neo4j to be analyzed

bitcoin dashboard data data-science graph graphdatabase neo4j python websocket

Last synced: 03 Jul 2025

https://github.com/gagandeepb/frames-beam

Accessing Postgres in a data frame in Haskell

data-science database postgres

Last synced: 20 Aug 2025

https://github.com/koonimaru/omniplot

Statistical analysis, clustering and visualinzing scientific data with hassle free

data-science matplotlib numpy pandas python

Last synced: 15 Apr 2025

https://github.com/github/mlops

Use GitHub to facilitate automation, collaboration and reproducibility in your machine learning workflows.

actions cicd data-science devops-tools machine-learning mlops pages primer primer-design

Last synced: 04 Oct 2025

https://github.com/jaimevalero/push-kaggle-dataset

Github action to upload datasets to kaggle

automation data-science github-actions kaggle kaggle-datasets

Last synced: 18 Jan 2026

https://github.com/curso-r/zen-do-r

Um livro sobre programação para não-programadores.

data-science r workflow

Last synced: 26 Oct 2025

https://github.com/jameslamb/talks

Conference talks, meetup talks, and misc. writing

conference-talk data-science machine-learning open-source presentations python r

Last synced: 06 Sep 2025

https://github.com/dhaitz/data-science-links

A curated list of links to great data science articles, videos, ...

agile ai artificial-intelligence career-advice data-science data-scientists machine-learning

Last synced: 12 Jun 2025

https://github.com/microsoft/autobrewml

With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.

anomaly-detection azure-automl cleansing-data data-science datavisualization machine-learning microsoft nlp-machine-learning responsible-ml sampling-strategies text-analysis text-classification text-summarization

Last synced: 12 Mar 2026

https://github.com/incubated-geek-cc/text-to-speech-app

A Fusion of OCR Technology (Tesseract.js) & Web Speech API. Standalone, portable and works offline.

data-science javascript machine-learning ocr ocr-recognition tesseract tesseract-ocr tesseract-ocr-api tesseractjs webapp

Last synced: 03 Mar 2026

https://github.com/florents-tselai/pandas-sets

Set-oriented Operations in Pandas

data-science pandas set-operations sets

Last synced: 11 Apr 2025

https://github.com/brpy/ml-books

A list of freely available Machine Learning related books.

books data-science free freely machine-learning statistics

Last synced: 20 Jan 2026

https://github.com/ahmedosamamath/statistics-basics

A comprehensive guide to applying statistical techniques in machine learning, including data preprocessing, model development, evaluation metrics, and real-world applications. This repository provides beginner-to-advanced insights into the statistical foundations of machine learning.

artificial-intelligence data-analysis data-science machine-learning statistics

Last synced: 12 Apr 2025

https://github.com/gabrieldim/a1on-webscraping-pandas-data-science

Learning WebScraping using Pandas in python. - Data Science

data data-science pandas sciecne web-scraping

Last synced: 10 Jul 2025

https://github.com/hoangsonww/north-carolina-household-analysis

🏠 This repository contains data analysis scripts for the 2022 American Community Survey (ACS) focusing on individuals aged 25 and over in North Carolina, based on 75,340 observations. This repository offers valuable insights into demographic and economic patterns across North Carolina's urban areas.

confidence-interval confidence-score data data-analysis data-analytics data-science data-visualization ggplot2 hypothesis-testing hypothesis-tests north-carolina r r-language r-programming stata

Last synced: 11 Apr 2025

https://github.com/code2k13/feed-visualizer

Feed Visualizer creates interactive visualizations by clustering RSS/Atom feed items based on semantic similarity. Feed Visualizer also attempts to automatically predict the labels for each cluster. This application will create a "semantic summary" of a website's contents by scanning its RSS/Atom feed, allowing for easy discovery and navigation to topics of interest. Feed Visualizer creates interactive visualizations in the form of static HTML and JS files, which may be edited and sent to a server.

artificial-intelligence atom data-science data-visualization machine-learning no-code python rss semantic-similarity visualization

Last synced: 06 May 2025

https://github.com/chalmerlowe/machine_learning

A gentle introduction to machine learning: data handling, linear regression, naive bayes, clustering

data data-science linear-regression machine-learning nearest-neighbors python scikit-learn

Last synced: 10 Apr 2025

https://github.com/humburg/reportmd

Create multi-page HTML reports in R

data-science r rmarkdown rstudio

Last synced: 20 Mar 2025

https://github.com/jmshea/foundations-of-data-science-with-python

Interactive flashcards and quizzes, as well as additional tutorials, animations, and code, for "Foundations of Data Science with Python" by John M. Shea

data-science data-visualization probability statistics statistics-course

Last synced: 13 Apr 2025

https://github.com/smathot/eeg_eyetracking_parser

Python routines for parsing of combined EEG and eye-tracking data

data data-science eeg eye eye-tracking mne pupillometry python

Last synced: 10 Apr 2025

https://github.com/goplus/pandas

Flexible and powerful data analysis / manipulation library for Go+, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

data-analysis data-science data-tech go golang gop goplus pandas scientific-computing

Last synced: 30 Apr 2025

https://github.com/frictionlessdata/datapackage-java

A Java library for working with Frictionless Data Data Packages.

data-science datapackage datapackage-java frictionlessdata java java-8 java8 json-schema

Last synced: 16 Mar 2026

https://github.com/facultyai/boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

data-cleaning data-science dataframe pandas restricted-boltzmann-machine

Last synced: 27 Jun 2025

https://github.com/sap-samples/btp-data-to-value-workshop

This repo contains a dataset, exercises, and sample code for an end-to-end SAP BTP data-to-value bootcamp covering SAP HANA Cloud, SAP Data Warehouse Cloud, SAP Data Intelligence Cloud, and SAP Analytics Cloud.

advanced-analytics analytics data-management data-orchestration data-science data-to-value machine-learning predictive-planning sample sample-code sap-analytics-cloud sap-btp sap-data-intelligence-cloud sap-data-warehouse-cloud sap-hana-cloud workshop

Last synced: 13 Apr 2025

https://github.com/codingforentrepreneurs/serverless-python-workflow-with-aws-lambda

A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.

aws aws-lambda data-science etl etl-pipeline python serverless webscraping

Last synced: 18 Jan 2026

https://github.com/capnion/ghostpii_client

This repository contains the Python library for interacting with Capnion's private computation API. Together this library and the API make up Ghost PII.

analytics data-science encryption privacy-enhancing-technologies

Last synced: 21 Feb 2026

https://github.com/mainakrepositor/brs

Recommend books using Machine Learning Techniques

data-science python-3

Last synced: 19 Jun 2025

https://github.com/catdevnull/preciazo

analisis de precios en supermercados minoristas. en constante evolución https://preciazo.nulo.lol

data data-science price-tracker scraper supermarket

Last synced: 17 Mar 2025

https://github.com/alenrajsp/tcxreader

tcxreader is a reader / parser for Garmin’s TCX file format. It also works well with missing data!

data-mining data-science python sports-analytics tcx tcx-parser

Last synced: 09 Apr 2025

https://github.com/a2i2/surround

Surround is a framework for building AI driven microservices in Python, https://surround.readthedocs.io/en/latest/

data-science machine-learning model-serving pipeline-framework python

Last synced: 14 Jan 2026

https://github.com/bessouat40/raglight

RAGLight is a lightweight and modular Python library for implementing Retrieval-Augmented Generation (RAG), Agentic RAG and RAT (Retrieval augmented thinking)..

agent agentic-ai agentic-rag agentic-workflow artificial-intelligence automation data-science embeddings framework huggingface inference llm lmstudio mistral-api mistralai ollama rag retrieval-augmented retrieval-augmented-generation vector-database

Last synced: 04 Mar 2026

https://github.com/kf5i/k3ai-core

K3ai-core is the core library for the GO installer. Go installer will replace the current bash installer

argo artificial-intelligence continuous-integration data-science golang k3s kubeflow kubernetes-deployment machine-learning machinelearning ml pipeline

Last synced: 14 Jan 2026

https://github.com/mlops-ai/mlops

Open-source tool for tracking & monitoring machine learning models.

ai data-science machine-learning mlflow mlops neptune python

Last synced: 14 Jan 2026

https://github.com/somdeep/Statball

Statball - Football soccer stats analyser from top 5 european leagues with data obtained by web scraping from Fbref and Statsbomb

csharp data-science data-scraping data-viz dotnet dotnet-core fbref football football-analytics football-data scouting-data scraping soccer soccer-analytics soccer-data statsbomb tableau visualizations

Last synced: 02 Apr 2025

https://github.com/amkrajewski/nimcso

nim Composition Space Optimization is a high-performance tool leveraging metaprogramming to implement several methods for selecting components (data dimensions) in compositional datasets, as to optimize the data availability and density for applications such as machine learning.

data-analysis data-optimization data-science materials-informatics metaprogramming nim nim-lang

Last synced: 09 Apr 2025