An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/alexcj10/diwali-sales-analysis

This repository contains an analysis of Diwali sales data to uncover trends and patterns in customer behavior. The project aims to provide insights into customer demographics, purchasing habits, and product preferences during the Diwali season.

analysis data-science diwali jupyter-notebook matplotlib numpy pandas python sales seaborn

Last synced: 15 Apr 2025

https://github.com/mohidex/data-pipeline-on-gcp

The Real-time Ecommerce Data Collection and Processing project empowers businesses with real-time insights by efficiently extracting, processing, and storing ecommerce data from multiple sources. Combining Golang and Python, this cutting-edge solution streamlines data handling from diverse ecommerce websites.

beautifulsoup data-engineer data-pipeline data-science database datastore dependency-injection firebase firestore gcp go golang google google-cloud pubsub python solid-principles storage web-scraping

Last synced: 14 Apr 2025

https://github.com/itzmeanjan/corporatez

Data analysis done on Ministry of Corporate Affairs, Govt. of India's open data to get deeper insight, with :heart:

company-data corporate data-science data-visualization govt-company india matplotlib opendata python3 visualization

Last synced: 14 Oct 2025

https://github.com/polis-community/red-dwarf

A DIMensional REDuction library for stellarpunk democracy into the long haul. (Inspired by Pol.is)

civic-tech collective-intelligence data-science deliberative-democracy democracy dimensionality-reduction participatory-democracy polis

Last synced: 06 Oct 2025

https://github.com/ruivieira/nim-mentat

A Nim library for data science and machine learning

data-science library machine-learning nim scientific-computing

Last synced: 10 Aug 2025

https://github.com/bhattbhavesh91/auto-sklearn-tutorial

Small tutorial on auto-sklearn which is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

auto-ml auto-sklearn automl data-science machine-learning python tutorial

Last synced: 27 Oct 2025

https://github.com/datasets/genome-sequencing-costs

Costs associated with DNA sequencing since 2001

data data-science genome

Last synced: 19 Oct 2025

https://github.com/iamyajat/whatsapp-chat-analyzer-api

An API to analyse WhatsApp chats and generate insights

data-analysis data-science fastapi python whatsapp

Last synced: 17 Oct 2025

https://github.com/alipsa/matrix

Groovy library for working with tabular data.

analytics data-science groovy tables

Last synced: 02 Apr 2026

https://github.com/stink-po/boxoffice_api

Unofficial Python API for Box Office Mojo

data-science dataset movies-and-cinemas scraper

Last synced: 07 Sep 2025

https://github.com/surajv311/udemy_course_resources

List of course resources from my Udemy Course : "Numpy for Data Science" 2020

arrays data-science numpy numpy-tutorial python3 udemy udemy-course

Last synced: 16 May 2025

https://github.com/navdeep-g/sdss-2019

Interpretable Machine Learning with rsparkling

data-science h2o-3 machine-learning r rsparkling spark sparklyr xai

Last synced: 07 Apr 2025

https://github.com/negativenagesh/arogyamitra

An accessible, reliable, and efficient platform for medical information and support using LLMs

data-science embeddings flask genai knowledgebase langchain llama2 llm meta-llama-2-chat pineconedb python semantic-indexing vector-database

Last synced: 19 Jun 2025

https://github.com/thecoderpinar/gen-expression

Gene expression analysis is a fundamental component of genomics research, providing valuable insights into how genes are regulated and their impact on various biological processes. This project delves into the realm of gene expression data, aiming to uncover hidden patterns and relationships within complex datasets. ๐Ÿš€

bioinformatics biotechnology data-analysis data-science data-visualization genomics kaggle machine-learning pca python

Last synced: 30 Apr 2025

https://github.com/dayyass/extended-naive-bayes

[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.

data-science distributions generative-model machine-learning naive-bayes python sampling scikit-learn

Last synced: 13 Apr 2025

https://github.com/barrettotte/ibmi-jupyter

Utility notebook for using Jupyter notebooks with IBMi for basic reports and visualizations.

data-science db2 db2i ibmi jupyter-notebook

Last synced: 11 Apr 2025

https://github.com/juliaml/datasciencetraits.jl

Traits for data science

data-science julia

Last synced: 09 Jul 2025

https://github.com/coelhosilva/flight-ad

flight-ad is a Python package for anomaly detection in the aviation domain built on top of scikit-learn.

anomaly-detection data-science fdm flight-data flight-data-analysis flight-data-monitoring machine-learning python scikit-learn

Last synced: 10 Apr 2025

https://github.com/omarsar/data_mining_hw_1

Contains information for the first assignment of Data Mining 2017 Fall, NTHU.

data data-mining data-science datavisualization pandas

Last synced: 10 Apr 2025

https://github.com/flexmonster/pivot-jupyter-notebook

Jupyter Notebook pivot table example with Flexmonster

data-analysis data-science interactive jupyter-notebook pivot-tables python

Last synced: 16 Jun 2025

https://github.com/mathworks-teaching-resources/probability-theory

A courseware module that covers the fundamental concepts in probability theory and their implications in data science. Topics include probability, random variables, and Bayes' Theorem.

bayesian-statistics courseware cwm data-science mathematics matlab matlab-live-script probability-theory random-variables

Last synced: 15 Jul 2025

https://github.com/overhash/supermarket-tracker

A supermarket aggregator for price information at New Zealand supermarkets

data-science new-zealand nz prices rust-lang supermarket

Last synced: 11 Apr 2025

https://github.com/vbyan/deeva

๐Ÿš€Deeva - your smart analytics companion for Object Detection datasets

data data-science data-visualization datasets deeva machine-learning object-detection plotly python statistics streamlit visualization

Last synced: 26 Jun 2025

https://github.com/tushar2704/machinealgobox

Explore common ML algorithms, from scratch implementations to real-world use cases, Each algorithm is accompanied by clear explanations, code implementations, and real-world use cases, enabling you to grasp their underlying principles and apply them to different problem domains.

algorithms alogorithms-implemented artificial-intelligence data data-analytics data-engineering data-science deployment machine-learning-algorithms mlops python r streamlit streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/iamantimpal/iamantimpal

๐Ÿ‘‹ Hi, I'm Antim Pal, the Founder of Optimism Educator. An online platform dedicated to empowering students with skills in Computer Science, Web Design, Graphic

data-analysis data-science data-visualization database database-design database-management datascience graphical-user-interface graphics grapic-design reading-list readme readme-badges readme-generator readme-md readme-profile readme-stats readme-template

Last synced: 10 Apr 2025

https://github.com/john-hawkins/projit

Application for managing the structure, properties, data, experiments and build of data science projects.

data-science experiments machine-learning project-management

Last synced: 23 Jun 2025

https://github.com/ndleah/transactions

๐Ÿช™ Linear regression model, predict monthly transaction amount

data-science financial-modeling linear-regression mlr transactions

Last synced: 05 May 2025

https://github.com/juliusmarkwei/crypto-jacking-classificatioin

classifying network activity from various websites as either cryptojacking or not based on features related to both network-based and host-based data.

cryptojacking data-science machine-learning python

Last synced: 13 Apr 2025

https://github.com/giswqs/timelapse

An interactive streamlit web app for creating satellite timelapse

data-science dataviz earthengine geopython python satellite streamlit

Last synced: 12 May 2025

https://github.com/ryanrudes/wikimedia

A dataset comprised of over 40 million images sourced from Wikimedia Commons

computer-vision data-science data-scraping dataset datasets deep-learning gans image images machine-learning wikimedia wikimedia-commons

Last synced: 13 Sep 2025

https://github.com/coalio/Assistant

A data science library providing flexible dataframes for Lua 5.1+

data-analysis data-science data-structures dataframe lua

Last synced: 11 Apr 2025

https://github.com/aflah02/nlp-albumentations-data-augmentation

This repository contains helper functions which can help you generate additional data points depending on your NLP task.

data-science nlp

Last synced: 09 Jul 2025

https://github.com/carlomazzaferro/numerai_easy_ml

General purpose workflow for machine learning projects applied to the https://numer.ai data challenges.

data-science mahchine-leaning numerai

Last synced: 26 Mar 2025

https://github.com/joshwlambert/daisieprep

Extracts phylogenetic island community data from phylogenetic trees

data-science island-biogeography phylogenetics r

Last synced: 18 Mar 2025

https://github.com/njlyon0/supportr

Support Functions for Wrangling and Visualization

data-science r-package

Last synced: 20 Mar 2025

https://github.com/arose13/pliablelasso

Python implementation of the pliable lasso

data-science machine-learning

Last synced: 09 May 2025

https://github.com/erp12/rica

DataFrame abstraction for Clojure data scientists.

clojure clojurescript data-science dataframe

Last synced: 11 Apr 2025

https://github.com/gjtorikian/destroy-all-monuments

This is data taken from the SPLC report titled "Whose Heritage? Public Symbols of the Confederacy" from April 21, 2016

data-science government-data social-justice

Last synced: 10 Apr 2025

https://github.com/hunterdii/iriswise

IrisWise is a machine learning application for predicting Iris flower species. Built with Streamlit, this app provides a user-friendly interface to input flower measurements and receive predictions using various models, including K-Nearest Neighbors, (Random Forest, SVM, and Logistic Regression) **(Working On It...)**.

classifier-model data-science flowers-recognition iris-dataset iris-recognition knn-classification machine-learning pickle python python3 streamlit streamlit-webapp

Last synced: 21 Feb 2026

https://github.com/blmoore/summerdatachallenge

My entry for: http://summerdatachallenge.com (I came 3rd)

analytics data-science london r real-estate rstats

Last synced: 30 Apr 2025

https://github.com/hevalhazalkurt/exploring_the_data_of_lego_history

A data exploration project on LEGO history in Python with pandas, matplotlib etc. (WIP)

data data-analysis data-science data-visualization datascience datasets lego lego-history matplotlib pandas python python3

Last synced: 13 Apr 2025

https://github.com/sayakpaul/applied-data-science-w-python-specialization

Contains my assignments, guiding notebooks (provided as the course materials) and the datasets.

data-science matplotlib numpy pandas python3 scipy scipy-stack

Last synced: 12 May 2025

https://github.com/leomaurodesenv/data-science-api-framework

A simple framework to test and deploy your Data Science API

api api-rest data-science dataops docker flask-api python

Last synced: 09 Sep 2025

https://github.com/neo4j-graph-examples/contact-tracing

Contact Tracing graph for pandemic spread e.g. COVID-19 based on http://blog.bruggen.com/search/label/contact%20tracing

contact-tracing covid-data covid19 data-science dataset example-data graphdb healthcare neo4j neo4j-approved

Last synced: 18 Jul 2025

https://github.com/yusufcinarci/web-scraping-projects

In these project files, I will host the web scraping examples that I will make day by day.

data-analysis data-science jupyter-notebook python web-scraping

Last synced: 01 May 2025

https://github.com/zaman-hamza/citadel-datathon

My submission to the 2022 East Coast Datathon. The event started on the 21st of March and ended on the 28th, lasting about a whole week. I was in a team of two where we analyzed the non-conventional indicators and instigators of traffic.

citadel data-science data-visualization datathon

Last synced: 10 Apr 2025

https://github.com/davidssmith/rawarray.jl

Raw array (RA) file format for simple, robust, and user-friendly N-dimensional array storage

bytes complex-numbers data-science file-format julia large-dataset large-files ra-format rawarray scientific-computing storage

Last synced: 10 Sep 2025

https://github.com/iguptashubham/online-retail-sales

This Power BI dashboard, designed for marketing strategists, analyzes sales trends and customer behavior. It provides key insights empowering them to identify sales opportunities and optimize marketing campaigns, ultimately boosting business sales.

dashboard data data-analysis data-analysis-project data-analysis-project-powerbi data-analysis-python data-project data-science powerbi project

Last synced: 19 Mar 2026

https://github.com/getindata/quickstart-ml-starter

Kedro starterts to quickly set up new projects according to QuickStart ML Blueprints practice.

data-science machine-learning

Last synced: 30 Oct 2025

https://github.com/whizsid/kddbscan-rs

A rust library inspired by kDDBSCAN clustering algorithm

clustering data-science density-based-clustering deviation machine-learning-algorithms pinned

Last synced: 10 Apr 2025

https://github.com/adamvvu/snapshot_ensemble

Train TensorFlow Keras models with cosine annealing and save an ensemble of models with no additional computational expense.

data-science deep-learning keras machine-learning python tensorflow

Last synced: 28 Oct 2025

https://github.com/rahul-jha98/restauranttrends.stats

Visualise the trends in food and restaurant choices of customers in a city by scraping data from Zomato.

data-analysis data-science visualization vuejs zomato zomato-api zomato-scraper

Last synced: 08 Jul 2025

https://github.com/chiraag-kakar/pubg

What's the best strategy to win in PUBG? Should you sit in one spot and hide your way into victory, or do you need to be the top shot? Let's let the data do the talking!

data-science feature-engineering machine-learning-algorithms project pubg-api random-forest

Last synced: 07 May 2025

https://github.com/nikbarb810/pattern-recognition

Basic pattern recognition algorithms implemented in Python

data-science ipynb-jupyter-notebook matplotlib numpy pattern-recognition python

Last synced: 06 Mar 2026

https://github.com/nationalparkservice/qckit

QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.

darwin-core data-quality data-science npsdataverse quality-control r r-package rstats

Last synced: 22 Jun 2025

https://github.com/vedadiyan/genql

GenQL is a generic querying language fully written in Go

data-analysis data-mapping data-processing data-science data-translation json json-data sql

Last synced: 22 Jun 2025

https://github.com/jbris/stan-cmdstanr-gpu-docker

A Docker image to run Stan, cmdstanr, and brms for Bayesian statistical modelling. GPU support using OpenCL is available.

bayes bayesian-inference brms cmdstan cmdstanr data-science docker posterior probabilistic-programming projpred rstan rstanarm shinystan stan stan-gpu stan-lang stan-math-library tidybayes tidyverse

Last synced: 04 May 2025

https://github.com/lucasrodes/pyphoon

Tools for Digital Typhoon DL/ML Project

data-science dataset environment machine-learning tropical-cyclone

Last synced: 18 Mar 2025

https://github.com/stefanrmmr/kaggle_twitter_airline_sentiment

Kaggle Twitter US Airline Sentiment, Implementation of a Tweet Text Sentiment Analysis Model, using custom trained Word Embeddings and LSTM-Deep learning [TUM-Data Analysis&ML summer 2021] @adrianbruenger @stefanrmmr

data-science deep-learning kaggle-airline-dataset kaggle-sentiment-analysis kaggle-us-airlines lstm-neural-networks python sentiment-analysis skipgram text-sentiment-classification tweepy tweet-classification tweet-sentiment-analysis twitter twitter-sentiment-analysis us-airline-dataset word2vec

Last synced: 19 Mar 2025

https://github.com/orkunaktas/wine-quality-prediction

๐Ÿท๐Ÿ”ฌ Wine Quality and Forecast ๐Ÿพ๐Ÿ‡

alcohol data-science logistic-regression wine-quality

Last synced: 08 Sep 2025

https://github.com/octoenergy/s3migrate

Bulk delete/copy/move files or modify Hive/Drill/Athena partitions using pythonic pattern matching

data data-science

Last synced: 24 Jun 2025

https://github.com/pharo-ai/data-partitioners

Pharo library for partitioning a collection. Given a set of proportions (e.g. 50%, 30%, and 20%), it shuffles the collection and divides it into non-empty subsets in such a way that every element is included in exactly one subset. Can be used in machine learning and statistical analysis for splitting data into training, validation, and test sets.

data-science machine-learning pharo statistical-analysis

Last synced: 11 Apr 2025

https://github.com/eikevons/pandas-paddles

Access the parent Pandas data frame in loc[], iloc[], assign(), and others Pandas helpers

data-analysis data-exploration data-science pandas pandas-dataframe pandas-library pandas-loc

Last synced: 16 Jun 2025

https://github.com/praveen1664/chatbot

This is a chatbot written in python & getting inputs directly from sql database

chatbot data-science database json nlp python3 sqlite sqlite3

Last synced: 11 Jul 2025