An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/itzmeanjan/corporatez

Data analysis done on Ministry of Corporate Affairs, Govt. of India's open data to get deeper insight, with :heart:

company-data corporate data-science data-visualization govt-company india matplotlib opendata python3 visualization

Last synced: 14 Oct 2025

https://github.com/yisaienkov/tinysets

The project aims to collect various datasets for tasks such as classification, clustering, object detection... The purpose of this datasets is quick checking models and algorithms performance.

algorithms classification data data-science dataset datasets kaggle kaggle-dataset lego lego-minifigures lego-sets object-detection pypi python regression text-classification tinysets

Last synced: 14 Apr 2025

https://github.com/stappit/blog

I often post solutions to textbook exercises, including: Bayesian Data Analysis (BDA) by Gelman et al; Causal Inference in Statistics Primer (CISP) by Pearl et al; Purely Functional Data Structures (PFDS) by Okasaki.

bayesian-data-analysis blog data-analysis data-science gelman hakyll haskell pearl purely-functional-data-structures solutions stan static-site statistical-inference statistics

Last synced: 14 Mar 2025

https://github.com/giswqs/timelapse

An interactive streamlit web app for creating satellite timelapse

data-science dataviz earthengine geopython python satellite streamlit

Last synced: 12 May 2025

https://github.com/zeitsperre/canada-climate-python

A set of methods for collecting, parsing, converting, and presenting Environment Canada Weather Station data

canada climate data-science python weather

Last synced: 22 Jul 2025

https://github.com/virajbhutada/spotify-track-analysis-and-recommendation

Experience a comprehensive exploration of Spotify's musical landscape seamlessly transitioned from Tableau visualizations to SQL analysis. Dive into track inventory, streaming metrics, and sonic trends via interactive dashboards, while leveraging SQL queries for deeper insights into KPIs and cross-platform rankings.

audio-analysis data-analysis data-analytics data-science data-visualization eda machine-learning-library ml-models mysql recommendation-system spotify spotify-data spotify-dataset sql-database sql-server streaming-metrics tableau tableau-public trends-analysis

Last synced: 28 Apr 2025

https://github.com/alipsa/matrix

Groovy library for working with tabular data.

analytics data-science groovy tables

Last synced: 02 Apr 2026

https://github.com/mdh266/crimetime

Python web application for exploring and forecasting crime rates in NYC

data-science docker flask-application forecasting-crime-rates geospatial-analysis pandas python statsmodels time-series-analysis

Last synced: 30 Jul 2025

https://github.com/invia-flights/blitzly

Lightning-fast way to get plots with Plotly ⚡️

data-analysis data-science plotly plotting-in-python python visualization

Last synced: 14 Jan 2026

https://github.com/teddyoweh/disease-data-webscrape-analysis

Scraped a table of disease & symptoms data from a website, and turned it to a dataframe, then extracted to a csv fie

data-analytics data-science data-visualization webscraping

Last synced: 09 Apr 2025

https://github.com/alexeatscake/gigaanalysis

A toolbox for processing data that can be expressed as a dependent and independent variable.

condensed-matter-physics data-science matplotlib numpy physics scipy

Last synced: 03 Jul 2025

https://github.com/sanvishal/Exoplanet-Explore

An Interactive data visualization of Exoplanets

animation d3js data-analysis data-science exoplanet python space visualization

Last synced: 14 Apr 2025

https://github.com/ruban2205/data-science-introduction

Welcome to the Data Science Introduction repository! This repository is designed to provide an introduction to the field of data science, covering various topics and techniques commonly used in the industry.

classification-algorithm data-science data-visualization decision-tree-classifier exploratory-data-analysis knn knn-classification python simple-linear-regression

Last synced: 11 Jul 2025

https://github.com/thetallprogrammer/stock-contender-app

Welcome to Stock Contender – an AI-powered tool designed to assist your market analysis. This tool is not an investment advisor and does not guarantee profits. Invest at your own risk. Stay updated with my latest developments.

artificial-intelligence chat-gpt data-science financial-data-analysis financial-technology fintech investment-analysis machine-learning openai openai-api python stock-market stock-prediction stock-trading

Last synced: 05 Sep 2025

https://github.com/navdeep-g/sdss-2019

Interpretable Machine Learning with rsparkling

data-science h2o-3 machine-learning r rsparkling spark sparklyr xai

Last synced: 07 Apr 2025

https://github.com/dataship/python-dataship

Lightweight tools for reading, writing and storing data, locally and over the internet for python

column-store data-science machine-learning numpy pandas

Last synced: 23 Apr 2025

https://github.com/sondosaabed/data-analyst-nanodegree

I aquired a full scholarship from Google Launchpad. Advanced data wrangling skills to work with messy, complex real-world datasets. Highly customized visualizations using the Matplotlib Python library

data-science dataanalysis datawrangling nanodegree python udacity-nanodegree

Last synced: 09 Apr 2025

https://github.com/lfrench03/ganaderia-en-cuba

Based on the data provided by the National Office of Statistics and Information ONEI and other alternative trusted sources mentioned in the references, our main objective is to present a detailed vision of how livestock farming has evolved in Cuba during the period until 2022.

cuba data-science dataproduct ganaderia streamlit streamlit-application timeline

Last synced: 26 Jul 2025

https://github.com/archie-cm/churn-analysis-ecommerce-customer

The objective of this project to is to predict customer churn, loss opportunity and provide recommendations to the business team so the company can implement a customer persona in retention strategy and can monitoring throught dashboard interactive.

data-science feature-engineering machine-learning python scikit-learn

Last synced: 23 Apr 2025

https://github.com/aryankeluskar/imdb-genres-analysis

Graph mean and variance of movies along the years. Then use Prophet and ARIMA to predict the average ratings by genres into the future.

data-science julia matplotlib monte-carlo python

Last synced: 08 Feb 2026

https://github.com/zachbateman/evogression

Python Machine Learning using an evolutionary regression algorithm. More intuitive with higher transparency than a neural network while providing much greater power and high-dimensionality capabilities than more simplistic regression techniques.

artificial-intelligence data-science machine-learning neural-network python regression

Last synced: 12 Jun 2025

https://github.com/ryanrudes/wikimedia

A dataset comprised of over 40 million images sourced from Wikimedia Commons

computer-vision data-science data-scraping dataset datasets deep-learning gans image images machine-learning wikimedia wikimedia-commons

Last synced: 13 Sep 2025

https://github.com/waylonwalker/kedro-auto-catalog

Kedro catalog create with default configuration

data data-science kedro kedro-catalog kedro-hook kedro-plugin

Last synced: 12 Jun 2025

https://github.com/timkong21/polyp-segmentation

Polyp segmentation tool utilizing U-Net for accurate medical image analysis, designed to enhance early detection and diagnosis of colorectal cancer. Features a user-friendly Streamlit web app for easy image processing and analysis, leveraging the Kvasir-SEG dataset for improved healthcare outcomes.

aws-s3 cancer-detection colonoscopy computer-vision data-augmentation data-science deep-learning diagnostics healthcare machine-learning medical-application medical-image-analysis medical-image-processing medical-image-segmentation opencv polyp-segmentation python streamlit tensorflow u-net

Last synced: 14 Apr 2025

https://github.com/mohidex/data-pipeline-on-gcp

The Real-time Ecommerce Data Collection and Processing project empowers businesses with real-time insights by efficiently extracting, processing, and storing ecommerce data from multiple sources. Combining Golang and Python, this cutting-edge solution streamlines data handling from diverse ecommerce websites.

beautifulsoup data-engineer data-pipeline data-science database datastore dependency-injection firebase firestore gcp go golang google google-cloud pubsub python solid-principles storage web-scraping

Last synced: 14 Apr 2025

https://github.com/pchtsp/pytups

Powerful dictionaries and tuple lists for data wrangling

data data-science dictionaries optimization tuples

Last synced: 14 Apr 2025

https://github.com/dhhruv/kisaani

"Kisaani" is an application that takes required parameters intelligently or from the database of the location (from the cloud) and provides the list of best crops suited for that land. The application should also be able to collect the outcome after cultivation and apply correction as appropriate for further advisories. The details of the crops for the region and conditions are provided. Applications should be interactive, user friendly for farmers (provide local language support) and should provide support in real time.

crop crop-recommendation data-science ieee ieee-hackathon machine-learning

Last synced: 07 Mar 2026

https://github.com/polis-community/red-dwarf

A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)

civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis

Last synced: 06 Oct 2025

https://github.com/ul-mds/gecko

Python library for the generation and mutation of realistic personal identification data at scale

data-science numpy pandas python record-linkage

Last synced: 24 Apr 2025

https://github.com/thomasthaddeus/dataanalysistoolkit

DataAnalysisToolkit is a Python-based data analysis tool designed to streamline various data analysis tasks. It provides the ability to load data from CSV files, perform statistical calculations, detect outliers, clean data, and visualize data.

data-science matplotlib python python-script python3 scikit-learn

Last synced: 07 Oct 2025

https://github.com/vbyan/deeva

🚀Deeva - your smart analytics companion for Object Detection datasets

data data-science data-visualization datasets deeva machine-learning object-detection plotly python statistics streamlit visualization

Last synced: 26 Jun 2025

https://github.com/surajv311/udemy_course_resources

List of course resources from my Udemy Course : "Numpy for Data Science" 2020

arrays data-science numpy numpy-tutorial python3 udemy udemy-course

Last synced: 16 May 2025

https://github.com/divyanshugit/66daysofdata

This repo contains the source code for a static webpage where you can find out answers to Machine Learning Interview questions.

data-science interview-questions machine-learning

Last synced: 31 Jan 2026

https://github.com/macropin/random-name-generator

Generate random male and female names with real-world probability.

data-science python random-generation test-data-generator

Last synced: 17 Jul 2025

https://github.com/yoshoku/numo-openblas

Numo::OpenBLAS builds and uses OpenBLAS as a background library for Numo::Linalg

data-science machine-learning numo openblas ruby

Last synced: 25 Apr 2025

https://github.com/nhs-south-central-and-west/data-science-guides

Guides for common data science tasks, in R & Python

data-science machine-learning python r regression

Last synced: 03 May 2025

https://github.com/techn0man1ac/toxiccommentclassification

This project aims to develop a model capable of identifying and classifying different levels of toxicity in comments, using the power of BERT(Bidirectional Encoder Representations from Transformers) for text analysis.

analysis bert-model classifying data-science docker machine-learning python streamlit text-classification transformers-models

Last synced: 18 Aug 2025

https://github.com/jamesquinlan/intro-python

Introduction to Programming and Data Science with Python

data-science nlp python python-3

Last synced: 18 Aug 2025

https://github.com/rafaelpermec/live-broker-api

Um estudo sobre raspagem de dados em back-end, simulando uma corretora que realiza ações de compra e venda de ativos e fluxo de caixa de clientes em tempo real.

authentication authorization backend-api cheerio data-science express helmet jwt-authentication mysql nodejs typescript web-scraping

Last synced: 19 Apr 2025

https://github.com/nicodupont/mooc

All my finished Moocs on the subject of the data science mainly

data-analysis data-science data-visualization datacamp jupyter-notebook machine-learning mooc pandas python sas sql

Last synced: 28 Apr 2025

https://github.com/ahammadmejbah/glossary-of-artificial-intelligence

A "Glossary of Artificial Intelligence" is a concise reference resource defining key terms, concepts, and terminology related to AI. It provides explanations and definitions to help individuals understand and navigate the field of artificial intelligence, making it a valuable tool for both beginners and experts in the AI domain.

artificial-intelligence data data-science deep-learning deep-learning-algorithms detection image-processing machine-learning python

Last synced: 25 Jun 2025

https://github.com/tchlux/util

My machine learning, optimization, and data science utilities package.

data-science machine-learning numerical-optimization python-utilities splines statistics visualization

Last synced: 02 May 2026

https://github.com/ramonhpr/knot-lib-python

API to get data from cloud and make some data analytics

data-science iot iot-framework web

Last synced: 26 Jun 2026

https://github.com/ndleah/transactions

🪙 Linear regression model, predict monthly transaction amount

data-science financial-modeling linear-regression mlr transactions

Last synced: 05 May 2025

https://github.com/iamantimpal/iamantimpal

👋 Hi, I'm Antim Pal, the Founder of Optimism Educator. An online platform dedicated to empowering students with skills in Computer Science, Web Design, Graphic

data-analysis data-science data-visualization database database-design database-management datascience graphical-user-interface graphics grapic-design reading-list readme readme-badges readme-generator readme-md readme-profile readme-stats readme-template

Last synced: 10 Apr 2025

https://github.com/erp12/rica

DataFrame abstraction for Clojure data scientists.

clojure clojurescript data-science dataframe

Last synced: 11 Apr 2025

https://github.com/john-hawkins/projit

Application for managing the structure, properties, data, experiments and build of data science projects.

data-science experiments machine-learning project-management

Last synced: 23 Jun 2025

https://github.com/tushar2704/machinealgobox

Explore common ML algorithms, from scratch implementations to real-world use cases, Each algorithm is accompanied by clear explanations, code implementations, and real-world use cases, enabling you to grasp their underlying principles and apply them to different problem domains.

algorithms alogorithms-implemented artificial-intelligence data data-analytics data-engineering data-science deployment machine-learning-algorithms mlops python r streamlit streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/doctor-phil/analyzing-economic-networks

Tutorial introduction to economic network analysis and graph clustering in python with networkx

centrality data-science economics graph-clustering networks social-network-analysis spectral-methods

Last synced: 14 Jul 2025

https://github.com/overhash/supermarket-tracker

A supermarket aggregator for price information at New Zealand supermarkets

data-science new-zealand nz prices rust-lang supermarket

Last synced: 11 Apr 2025

https://github.com/Badr-MOUFAD/cookiecutter-simple-DS-project

A simple cookiecutter template to structure your Data Science projects.

cookiecutter data-science project-structure python simple-ds-project

Last synced: 08 May 2025

https://github.com/aflah02/nlp-albumentations-data-augmentation

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

data-science nlp

Last synced: 09 Jul 2025

https://github.com/joshwlambert/daisieprep

Extracts phylogenetic island community data from phylogenetic trees

data-science island-biogeography phylogenetics r

Last synced: 18 Mar 2025

https://github.com/coalio/Assistant

A data science library providing flexible dataframes for Lua 5.1+

data-analysis data-science data-structures dataframe lua

Last synced: 11 Apr 2025

https://github.com/juliusmarkwei/crypto-jacking-classificatioin

classifying network activity from various websites as either cryptojacking or not based on features related to both network-based and host-based data.

cryptojacking data-science machine-learning python

Last synced: 13 Apr 2025

https://github.com/thecoderpinar/gen-expression

Gene expression analysis is a fundamental component of genomics research, providing valuable insights into how genes are regulated and their impact on various biological processes. This project delves into the realm of gene expression data, aiming to uncover hidden patterns and relationships within complex datasets. 🚀

bioinformatics biotechnology data-analysis data-science data-visualization genomics kaggle machine-learning pca python

Last synced: 30 Apr 2025

https://github.com/coelhosilva/flight-ad

flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.

anomaly-detection data-science fdm flight-data flight-data-analysis flight-data-monitoring machine-learning python scikit-learn

Last synced: 10 Apr 2025

https://github.com/njlyon0/supportr

Support Functions for Wrangling and Visualization

data-science r-package

Last synced: 20 Mar 2025

https://github.com/negativenagesh/arogyamitra

An accessible, reliable, and efficient platform for medical information and support using LLMs

data-science embeddings flask genai knowledgebase langchain llama2 llm meta-llama-2-chat pineconedb python semantic-indexing vector-database

Last synced: 19 Jun 2025

https://github.com/arose13/pliablelasso

Python implementation of the pliable lasso

data-science machine-learning

Last synced: 09 May 2025

https://github.com/nikoshet/new-york-city-taxi-fare-prediction-machine-learning

Project for course 'Machine Learning' for M.Sc. 'Data Science and Machine Learning' in NTUA

data-science keras-tensorflow machine-learning numpy pandas python

Last synced: 12 Sep 2025

https://github.com/barrettotte/ibmi-jupyter

Utility notebook for using Jupyter notebooks with IBMi for basic reports and visualizations.

data-science db2 db2i ibmi jupyter-notebook

Last synced: 11 Apr 2025

https://github.com/flexmonster/pivot-jupyter-notebook

Jupyter Notebook pivot table example with Flexmonster

data-analysis data-science interactive jupyter-notebook pivot-tables python

Last synced: 16 Jun 2025

https://github.com/tushar2704/superstore-sales-dashboard-with-streamlit

Superstore Sales with Streamlit is a data visualization and analysis project that uses the Streamlit framework to create an interactive web application for exploring and analyzing sales data from a superstore. This project aims to provide an easy-to-use interface for users to gain insights into sales trends, Sales performance, product performance,

analytics dashboard data-analytics data-science data-science-projects python streamlit streamlit-tushar2704 trend-analysis tushar2704

Last synced: 07 May 2025

https://github.com/amruthpillai/machine-learning-a-z

Hands-On Python & R in Data Science - Udemy Course: https://www.udemy.com/machinelearning/learn/v4/overview

data-science machine-learning python r udemy

Last synced: 09 May 2025

https://github.com/mathworks-teaching-resources/probability-theory

A courseware module that covers the fundamental concepts in probability theory and their implications in data science. Topics include probability, random variables, and Bayes' Theorem.

bayesian-statistics courseware cwm data-science mathematics matlab matlab-live-script probability-theory random-variables

Last synced: 15 Jul 2025

https://github.com/juliaml/datasciencetraits.jl

Traits for data science

data-science julia

Last synced: 09 Jul 2025

https://github.com/carlomazzaferro/numerai_easy_ml

General purpose workflow for machine learning projects applied to the https://numer.ai data challenges.

data-science mahchine-leaning numerai

Last synced: 26 Mar 2025

https://github.com/dayyass/extended-naive-bayes

[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.

data-science distributions generative-model machine-learning naive-bayes python sampling scikit-learn

Last synced: 13 Apr 2025

https://github.com/omarsar/data_mining_hw_1

Contains information for the first assignment of Data Mining 2017 Fall, NTHU.

data data-mining data-science datavisualization pandas

Last synced: 10 Apr 2025

https://github.com/tsg405/sql-for-data-science----coursera

This Repo contains - Starter files, Coursework, Programming Assignments for the course --> SQL for Data Science from University of California, Davis [COURSERA]

california chinook-database coursera data-science query-language quiz sql sqlite ucdavis-datalab yelp-dataset

Last synced: 14 Apr 2025