An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/thetallprogrammer/stock-contender-app

Welcome to Stock Contender โ€“ an AI-powered tool designed to assist your market analysis. This tool is not an investment advisor and does not guarantee profits. Invest at your own risk. Stay updated with my latest developments.

artificial-intelligence chat-gpt data-science financial-data-analysis financial-technology fintech investment-analysis machine-learning openai openai-api python stock-market stock-prediction stock-trading

Last synced: 05 Sep 2025

https://github.com/virajbhutada/spotify-track-analysis-and-recommendation

Experience a comprehensive exploration of Spotify's musical landscape seamlessly transitioned from Tableau visualizations to SQL analysis. Dive into track inventory, streaming metrics, and sonic trends via interactive dashboards, while leveraging SQL queries for deeper insights into KPIs and cross-platform rankings.

audio-analysis data-analysis data-analytics data-science data-visualization eda machine-learning-library ml-models mysql recommendation-system spotify spotify-data spotify-dataset sql-database sql-server streaming-metrics tableau tableau-public trends-analysis

Last synced: 28 Apr 2025

https://github.com/memgonzales/pisa-2018-analysis

Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset

data-cleaning data-modeling data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy oecd-data pandas pisa scipy statistical-inference

Last synced: 13 Jun 2025

https://github.com/nhs-south-central-and-west/data-science-guides

Guides for common data science tasks, in R & Python

data-science machine-learning python r regression

Last synced: 03 May 2025

https://github.com/datasets/genome-sequencing-costs

Costs associated with DNA sequencing since 2001

data data-science genome

Last synced: 19 Oct 2025

https://github.com/neverinfamous/postgres-mcp

Secure PostgreSQL Administration & Observability with Code Modeโ€” True V8 Isolate Sandbox Replacing 248 Specialized Tools for up to 90% Token Savings. Includes Tool Filtering, Payload Optimization, HTTP/SSE, OAuth 2.1, Audit & Token Logging, Deterministic Error Handling, Support for 12 Extensions (pgvector, PostGIS, pg_partman, pg_cron & more).

ai-agents citext code-mode data-science database database-management developer-tools hypopg kcache ltree mcp npm oauth2 pg-cron pg-partman pgcrypto pgvector postgis postgresql typescript

Last synced: 09 Apr 2026

https://github.com/alexeatscake/gigaanalysis

A toolbox for processing data that can be expressed as a dependent and independent variable.

condensed-matter-physics data-science matplotlib numpy physics scipy

Last synced: 03 Jul 2025

https://github.com/arbox/learning-scala-for-data-science

Data Science: Scala for brave and impatient

big-data bigdata data-science datascience scala spark

Last synced: 10 Mar 2026

https://github.com/dhhruv/kisaani

"Kisaani" is an application that takes required parameters intelligently or from the database of the location (from the cloud) and provides the list of best crops suited for that land. The application should also be able to collect the outcome after cultivation and apply correction as appropriate for further advisories. The details of the crops for the region and conditions are provided. Applications should be interactive, user friendly for farmers (provide local language support) and should provide support in real time.

crop crop-recommendation data-science ieee ieee-hackathon machine-learning

Last synced: 07 Mar 2026

https://github.com/ul-mds/gecko

Python library for the generation and mutation of realistic personal identification data at scale

data-science numpy pandas python record-linkage

Last synced: 24 Apr 2025

https://github.com/zmoooooritz/stapy

An easy to use SensorThings API Client written in Python

api cli data-science database ogc python sensor sensor-data sensorthings sensorthings-api

Last synced: 17 Jan 2026

https://github.com/public-health-scotland/technical-docs

Technical documentation, including guidance and best practice for Public Health Scotland (PHS)

data-science documentation git github python r

Last synced: 14 Apr 2025

https://github.com/bhattbhavesh91/auto-sklearn-tutorial

Small tutorial on auto-sklearn which is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

auto-ml auto-sklearn automl data-science machine-learning python tutorial

Last synced: 27 Oct 2025

https://github.com/tom-uchida/gci2020_winter

Chair for Global Consumer Intelligence, The University of Tokyo.

data-science machine-learning marketing

Last synced: 23 Jun 2025

https://github.com/archie-cm/churn-analysis-ecommerce-customer

The objective of this project to is to predict customer churn, loss opportunity and provide recommendations to the business team so the company can implement a customer persona in retention strategy and can monitoring throught dashboard interactive.

data-science feature-engineering machine-learning python scikit-learn

Last synced: 23 Apr 2025

https://github.com/dataship/python-dataship

Lightweight tools for reading, writing and storing data, locally and over the internet for python

column-store data-science machine-learning numpy pandas

Last synced: 23 Apr 2025

https://github.com/sondosaabed/data-analyst-nanodegree

I aquired a full scholarship from Google Launchpad. Advanced data wrangling skills to work with messy, complex real-world datasets. Highly customized visualizations using the Matplotlib Python library

data-science dataanalysis datawrangling nanodegree python udacity-nanodegree

Last synced: 09 Apr 2025

https://github.com/zachbateman/evogression

Python Machine Learning using an evolutionary regression algorithm. More intuitive with higher transparency than a neural network while providing much greater power and high-dimensionality capabilities than more simplistic regression techniques.

artificial-intelligence data-science machine-learning neural-network python regression

Last synced: 12 Jun 2025

https://github.com/coatless-textbooks/statistical-concepts-with-shiny-apps

Quarto book illustrating various statistical concepts using Shinylive.

data-science quarto quarto-book r-shiny r-shinylive statistics webr

Last synced: 12 Jun 2025

https://github.com/lfrench03/ganaderia-en-cuba

Based on the data provided by the National Office of Statistics and Information ONEI and other alternative trusted sources mentioned in the references, our main objective is to present a detailed vision of how livestock farming has evolved in Cuba during the period until 2022.

cuba data-science dataproduct ganaderia streamlit streamlit-application timeline

Last synced: 26 Jul 2025

https://github.com/thomasthaddeus/dataanalysistoolkit

DataAnalysisToolkit is a Python-based data analysis tool designed to streamline various data analysis tasks. It provides the ability to load data from CSV files, perform statistical calculations, detect outliers, clean data, and visualize data.

data-science matplotlib python python-script python3 scikit-learn

Last synced: 07 Oct 2025

https://github.com/polis-community/red-dwarf

A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)

civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis

Last synced: 06 Oct 2025

https://github.com/zMoooooritz/stapy

An easy to use SensorThings API Client written in Python

api cli data-science database ogc python sensor sensor-data sensorthings sensorthings-api

Last synced: 15 May 2025

https://github.com/waylonwalker/kedro-auto-catalog

Kedro catalog create with default configuration

data data-science kedro kedro-catalog kedro-hook kedro-plugin

Last synced: 12 Jun 2025

https://github.com/nicodupont/mooc

All my finished Moocs on the subject of the data science mainly

data-analysis data-science data-visualization datacamp jupyter-notebook machine-learning mooc pandas python sas sql

Last synced: 28 Apr 2025

https://github.com/iamyajat/whatsapp-chat-analyzer-api

An API to analyse WhatsApp chats and generate insights

data-analysis data-science fastapi python whatsapp

Last synced: 17 Oct 2025

https://github.com/rafaelpermec/live-broker-api

Um estudo sobre raspagem de dados em back-end, simulando uma corretora que realiza aรงรตes de compra e venda de ativos e fluxo de caixa de clientes em tempo real.

authentication authorization backend-api cheerio data-science express helmet jwt-authentication mysql nodejs typescript web-scraping

Last synced: 19 Apr 2025

https://github.com/matcom/programming-for-data-science

Curso de Programaciรณn para la carrera de Ciencia de Datos de la Facultad de Matemรกtica y Computaciรณn de la Universidad de La Habana.

data-science data-science-python introduction-to-data-science introduction-to-programming introduction-to-python matcom matcom-uh programming programming-course python python-data-science university-of-havana

Last synced: 12 Oct 2025

https://github.com/jeafreezy/rsgis

A python package for basic to advanced GIS operations.

analysis data-science gis python

Last synced: 12 Apr 2025

https://github.com/ahammadmejbah/glossary-of-artificial-intelligence

A "Glossary of Artificial Intelligence" is a concise reference resource defining key terms, concepts, and terminology related to AI. It provides explanations and definitions to help individuals understand and navigate the field of artificial intelligence, making it a valuable tool for both beginners and experts in the AI domain.

artificial-intelligence data data-science deep-learning deep-learning-algorithms detection image-processing machine-learning python

Last synced: 25 Jun 2025

https://github.com/WaylonWalker/kedro-auto-catalog

Kedro catalog create with default configuration

data data-science kedro kedro-catalog kedro-hook kedro-plugin

Last synced: 24 Mar 2025

https://github.com/surajv311/udemy_course_resources

List of course resources from my Udemy Course : "Numpy for Data Science" 2020

arrays data-science numpy numpy-tutorial python3 udemy udemy-course

Last synced: 16 May 2025

https://github.com/macropin/random-name-generator

Generate random male and female names with real-world probability.

data-science python random-generation test-data-generator

Last synced: 17 Jul 2025

https://github.com/inseefrlab/grandedim

Codes correspondant au document de travail "L'รฉconomรฉtrie en grande dimension"

data-science econometrics high-dimensional-data publication r statistics

Last synced: 13 Jun 2025

https://github.com/nemeslaszlo/social-media-analysis-based-on-covid-19-with-sentiment-analysis-ner-and-information-extraction

This repository contains the social media data scraper and the notebooks of this analysis. Where we analise the Social Media posts - tweets with Sentiment Analysis then we analyse this results with Named Entity Recognition (NER) and Information Extraction methods to get a more accurate and detailed picture of this sentiment results.

bert data-science data-visualization information-extraction keras named-entity-recognition nltk reccurent-neural-network tensorflow textblob

Last synced: 25 Jul 2025

https://github.com/ruban2205/data-science-introduction

Welcome to the Data Science Introduction repository! This repository is designed to provide an introduction to the field of data science, covering various topics and techniques commonly used in the industry.

classification-algorithm data-science data-visualization decision-tree-classifier exploratory-data-analysis knn knn-classification python simple-linear-regression

Last synced: 11 Jul 2025

https://github.com/mims-harvard/patient-safety

Population-scale patient safety data reveal inequalities in adverse events before and during COVID-19 pandemic

adverse-events data-science graph-algorithms graphs networks pandemic

Last synced: 10 Oct 2025

https://github.com/coalio/Assistant

A data science library providing flexible dataframes for Lua 5.1+

data-analysis data-science data-structures dataframe lua

Last synced: 11 Apr 2025

https://github.com/erp12/rica

DataFrame abstraction for Clojure data scientists.

clojure clojurescript data-science dataframe

Last synced: 11 Apr 2025

https://github.com/barrettotte/ibmi-jupyter

Utility notebook for using Jupyter notebooks with IBMi for basic reports and visualizations.

data-science db2 db2i ibmi jupyter-notebook

Last synced: 11 Apr 2025

https://github.com/juliaml/datasciencetraits.jl

Traits for data science

data-science julia

Last synced: 09 Jul 2025

https://github.com/overhash/supermarket-tracker

A supermarket aggregator for price information at New Zealand supermarkets

data-science new-zealand nz prices rust-lang supermarket

Last synced: 11 Apr 2025

https://github.com/iamantimpal/iamantimpal

๐Ÿ‘‹ Hi, I'm Antim Pal, the Founder of Optimism Educator. An online platform dedicated to empowering students with skills in Computer Science, Web Design, Graphic

data-analysis data-science data-visualization database database-design database-management datascience graphical-user-interface graphics grapic-design reading-list readme readme-badges readme-generator readme-md readme-profile readme-stats readme-template

Last synced: 10 Apr 2025

https://github.com/Badr-MOUFAD/cookiecutter-simple-DS-project

A simple cookiecutter template to structure your Data Science projects.

cookiecutter data-science project-structure python simple-ds-project

Last synced: 08 May 2025

https://github.com/aflah02/nlp-albumentations-data-augmentation

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

data-science nlp

Last synced: 09 Jul 2025

https://github.com/ndleah/transactions

๐Ÿช™ Linear regression model, predict monthly transaction amount

data-science financial-modeling linear-regression mlr transactions

Last synced: 05 May 2025

https://github.com/dayyass/extended-naive-bayes

[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.

data-science distributions generative-model machine-learning naive-bayes python sampling scikit-learn

Last synced: 13 Apr 2025

https://github.com/amruthpillai/machine-learning-a-z

Hands-On Python & R in Data Science - Udemy Course: https://www.udemy.com/machinelearning/learn/v4/overview

data-science machine-learning python r udemy

Last synced: 09 May 2025

https://github.com/coelhosilva/flight-ad

flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.

anomaly-detection data-science fdm flight-data flight-data-analysis flight-data-monitoring machine-learning python scikit-learn

Last synced: 10 Apr 2025

https://github.com/joshwlambert/daisieprep

Extracts phylogenetic island community data from phylogenetic trees

data-science island-biogeography phylogenetics r

Last synced: 18 Mar 2025

https://github.com/mathworks-teaching-resources/probability-theory

A courseware module that covers the fundamental concepts in probability theory and their implications in data science. Topics include probability, random variables, and Bayes' Theorem.

bayesian-statistics courseware cwm data-science mathematics matlab matlab-live-script probability-theory random-variables

Last synced: 15 Jul 2025

https://github.com/doctor-phil/analyzing-economic-networks

Tutorial introduction to economic network analysis and graph clustering in python with networkx

centrality data-science economics graph-clustering networks social-network-analysis spectral-methods

Last synced: 14 Jul 2025

https://github.com/john-hawkins/projit

Application for managing the structure, properties, data, experiments and build of data science projects.

data-science experiments machine-learning project-management

Last synced: 23 Jun 2025

https://github.com/thecoderpinar/gen-expression

Gene expression analysis is a fundamental component of genomics research, providing valuable insights into how genes are regulated and their impact on various biological processes. This project delves into the realm of gene expression data, aiming to uncover hidden patterns and relationships within complex datasets. ๐Ÿš€

bioinformatics biotechnology data-analysis data-science data-visualization genomics kaggle machine-learning pca python

Last synced: 30 Apr 2025

https://github.com/njlyon0/supportr

Support Functions for Wrangling and Visualization

data-science r-package

Last synced: 20 Mar 2025

https://github.com/carlomazzaferro/numerai_easy_ml

General purpose workflow for machine learning projects applied to the https://numer.ai data challenges.

data-science mahchine-leaning numerai

Last synced: 26 Mar 2025

https://github.com/arose13/pliablelasso

Python implementation of the pliable lasso

data-science machine-learning

Last synced: 09 May 2025

https://github.com/negativenagesh/arogyamitra

An accessible, reliable, and efficient platform for medical information and support using LLMs

data-science embeddings flask genai knowledgebase langchain llama2 llm meta-llama-2-chat pineconedb python semantic-indexing vector-database

Last synced: 19 Jun 2025

https://github.com/tushar2704/machinealgobox

Explore common ML algorithms, from scratch implementations to real-world use cases, Each algorithm is accompanied by clear explanations, code implementations, and real-world use cases, enabling you to grasp their underlying principles and apply them to different problem domains.

algorithms alogorithms-implemented artificial-intelligence data data-analytics data-engineering data-science deployment machine-learning-algorithms mlops python r streamlit streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/tushar2704/superstore-sales-dashboard-with-streamlit

Superstore Sales with Streamlit is a data visualization and analysis project that uses the Streamlit framework to create an interactive web application for exploring and analyzing sales data from a superstore. This project aims to provide an easy-to-use interface for users to gain insights into sales trends, Sales performance, product performance,

analytics dashboard data-analytics data-science data-science-projects python streamlit streamlit-tushar2704 trend-analysis tushar2704

Last synced: 07 May 2025

https://github.com/juliusmarkwei/crypto-jacking-classificatioin

classifying network activity from various websites as either cryptojacking or not based on features related to both network-based and host-based data.

cryptojacking data-science machine-learning python

Last synced: 13 Apr 2025

https://github.com/flexmonster/pivot-jupyter-notebook

Jupyter Notebook pivot table example with Flexmonster

data-analysis data-science interactive jupyter-notebook pivot-tables python

Last synced: 16 Jun 2025

https://github.com/omarsar/data_mining_hw_1

Contains information for the first assignment of Data Mining 2017 Fall, NTHU.

data data-mining data-science datavisualization pandas

Last synced: 10 Apr 2025

https://github.com/ucdavisdatalab/workshop_web_maps

Learn to build an interactive web map to display spatial data

data-science geospatial-visualization teaching-materials ucdavis ucdavis-datalab workshop

Last synced: 05 Mar 2026

https://github.com/tchlux/util

My machine learning, optimization, and data science utilities package.

data-science machine-learning numerical-optimization python-utilities splines statistics visualization

Last synced: 02 May 2026

https://github.com/mine-cetinkaya-rundel/feedback-at-scale

Slides and sample learnr tutorial for rstudio::global(2021) talk

data-science gradethis learnr rstats tutorial

Last synced: 11 Feb 2026

https://github.com/betaandbit/wykresy

Wykresy od kuchni

data-science wizualizacja wykresy

Last synced: 01 Feb 2026

https://github.com/worldbank/trend-narrative

Python package for piecewise-linear trend detection and plain-English narrative generation for time series data.

data-science natural-language-generation open-source piecewise-regression time-series trend-detection

Last synced: 02 Jun 2026

https://github.com/canbula/datascience

Repository for Data Science course given by Assoc. Prof. Dr. Bora Canbula at Computer Engineering Department of Manisa Celal Bayar University.

data-science machine-learning matplotlib numpy pandas python python3 scikit-learn seaborn

Last synced: 04 Apr 2026

https://github.com/divyanshugit/66daysofdata

This repo contains the source code for a static webpage where you can find out answers to Machine Learning Interview questions.

data-science interview-questions machine-learning

Last synced: 31 Jan 2026

https://github.com/ramonhpr/knot-lib-python

API to get data from cloud and make some data analytics

data-science iot iot-framework web

Last synced: 26 Jun 2026