An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/RDeconomist/RDeconomist.github.io

RapidCharts - a site for teaching and demonstrating Data Science and Visualisation techniques

data data-science data-visualization economics politics sports

Last synced: 08 Apr 2025

https://github.com/ahammadmejbah/becoming-a-python-developer

Becoming a Python developer involves mastering the Python programming language, understanding its syntax, and learning popular frameworks. Gain proficiency in web development, data analysis, or automation. Collaborate on projects, build a strong portfolio, and stay updated on industry trends to excel in this dynamic and versatile field.

algorithms algorithms-and-data-structures data-science deep deep-learning deep-neural-networks machine-learning machine-learning-algorithms machinelearning python python3

Last synced: 11 Jun 2025

https://github.com/mine-cetinkaya-rundel/tidymodels-uscots-2021

Materials for the "Tidy up your models" workshop at USCOTS 2021

data-science modeling rstats statistics tidymiodels tidyverse

Last synced: 08 Apr 2025

https://github.com/erictleung/tutorial-tidyverse

:milky_way: Presentation on the tidyverse in R to clean and manipulate data

data-cleaning data-manipulation data-science manipulate-data presentation programming r tidyverse tutorial

Last synced: 25 Mar 2025

https://github.com/flrs/build_and_test_ml_quickly

From idea to production in a day: Leveraging Azure ML and Streamlit to build and user test machine learning ideas quickly (Talk at PyCon DE & PyData Berlin 2024)

agile-development azure-application-insights azure-machine-learning data-science machine-learning streamlit streamlit-feedback

Last synced: 21 Apr 2025

https://github.com/ucbds-infra/ds-course-infra-guide

An educator's guide to creating a data science course

data-science jupyter jupyter-book

Last synced: 07 Oct 2025

https://github.com/thavlik/doom-gameplay-dataset

A dataset of Doom 1 & 2 gameplay videos preprocessed for deep learning

data-science dataset deep-learning doom doom-gameplay machine-learning quake-gameplay-dataset videos

Last synced: 16 Jan 2026

https://github.com/hritik5102/shala2020

MastAI ki paathSHALA : Data Science, Machine Learning, and Deep Learning codes with explanation and reference links ๐Ÿ‘จโ€๐Ÿ’ป

artificial-intelligence computer-vision data-science deep-learning machine-learning statistics

Last synced: 12 Oct 2025

https://github.com/erictleung/data-science

:computer: Repository for teaching materials and notes on machine learning and data science for freeCodeCamp

data-cleaning data-engineering data-science data-visualization freecodecamp learning machine-learning mathematics notes python statistics

Last synced: 25 Mar 2025

https://github.com/caerbannogwhite/aargh

A library that helps you out of data nightmares in Go. ๐Ÿง™โ€โ™‚๏ธ

csv data data-science data-wrangling dataframe go golang html json linq statistics stats xlsx xpt

Last synced: 14 Jan 2026

https://github.com/sferez/twitter_toolbox

Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data

data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api

Last synced: 10 Apr 2025

https://github.com/zsxkib/ttds-g35-cw3

TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad

corpora data-science neural-ranking-models pagerank query search-engine technologies text text-analysis text-classification ttds web-search

Last synced: 10 Apr 2025

https://github.com/sondosaabed/oil-vs-bigtech-stock-investigation

๐Ÿ’น๐Ÿ“ˆInvestigating the oils market prices in addition to the stock market prices between the start of 2001 to the end of 2023. ๐Ÿ’ฐ๐Ÿ“‰

advanced-data-wrangling api data-analyst-nanodegree data-assessment data-gathering data-science python requests wrangling-data

Last synced: 09 Apr 2025

https://github.com/seporaitis/hsaur-python

Various exercises from the book "Handbook of Statistical Analysis Using R" done in Python

bokeh books data-science hsaur learning-by-doing learning-pandas pandas python r seaborn statistical-analysis statistics tutorial

Last synced: 09 Apr 2025

https://github.com/paulohrpinheiro/ds_from_0

Exercรญcios e experimentos para a leitura do livro 'Data Science do Zero'

data-science python3

Last synced: 10 Jul 2025

https://github.com/hmiladhia/piskle

A serialization package optimized for scikit-learn

data-science machine-learning python scikit-learn serialization

Last synced: 28 Oct 2025

https://github.com/gbeckers/birdwatcher

A Python computer vision library for animal behavior

animal behavior computer-vision data-science ffmpeg opencv python science

Last synced: 13 Oct 2025

https://github.com/moindalvs/assignment_east-west_airlines

Problem Statement Perform clustering (Hierarchical,K means clustering and DBSCAN) for the airlines data to obtain optimum number of clusters

clustering-algorithm data-science dbscan-clustering epsilon-greedy hierarchical-clustering kmeans-clustering

Last synced: 23 Apr 2025

https://github.com/theakashshukla/r-project

๐ŸŽ“ A Collection of Programming Assignment for R Language

algorithms data-analysis data-science data-science-projects ml r

Last synced: 24 Jul 2025

https://github.com/takuti/pyhivemall

Using machine learning model from Apache Hivemall :bee: in Python :snake:

data-science hive machine-learning python

Last synced: 15 Apr 2025

https://github.com/koldlight/r4ds

R for data science course

course data-analysis data-science data-viz r

Last synced: 30 Apr 2025

https://github.com/timkoornstra/fintwitbert

FinTwitBERT: Specialized BERT Model for Financial Twitter Analysis. Trained on vast financial tweets, it's ideal for sentiment analysis, trend prediction, and financial NLP tasks.

ai bert cryptocurrency data-science financial-tweets fintech fintwitbert language-model machine-learning nlp python sentiment-analysis stock-market trend-prediction twitter-data

Last synced: 14 Apr 2025

https://github.com/jasonmdev/guidedprojects

As part of the DataQuest Curriculum, guided projects are less structured and focus more on exploration.

data-science jupyter-notebook

Last synced: 14 May 2025

https://github.com/ahammadmejbah/ultimate-data-science-resources

๐Ÿš€ Welcome to the Unlimited Data Science Resources community! Dive into a wealth of knowledge with curated tutorials, courses, and insights. Elevate your data science journey with boundless learning opportunities! ๐Ÿ“Šโœจ

data-engineering data-mining data-science data-visualization database datascience

Last synced: 26 Feb 2025

https://github.com/tushar2704/store-demand-forecasting

This project predicts the sales demand for various items in different stores based on historical sales data. The objective is to develop a machine learning model that can provide accurate forecasts for future sales of each store-item combination.

artifi data-analysis data-science python sales-analysis sales-forecasting tushar2704

Last synced: 04 Nov 2025

https://github.com/compgeolab/temperature-data

Download and create a subset of global country-average temperature data from Berkeley Earth

climate climate-data data-science open-data temperature

Last synced: 15 Apr 2025

https://github.com/zapier/awsjavasdk

Boilerplate rJava Access to the AWS Java SDK

awsjavasdk cran data-science

Last synced: 14 Apr 2025

https://github.com/mribeirodantas/vidente

R package to parse and preprocess the Surveillance, Epidemiology, and End Results (SEER) Program data from NIH/NCI

cancer-patients cancer-research data-analysis data-science data-structures r-package seer

Last synced: 14 Apr 2025

https://github.com/aianytime/fishvision

FishVision built using Streamlit identifies the different species of Fishes in a given image. It is trained on "A Large Scale Fish Data" available on Kaggle using the pre-trained model "MobileNetV2".

data-science deep-learning deep-neural-networks heroku heroku-deployment machine-learning machine-learning-algorithms mobilenetv2 python python3 streamlit streamlit-webapp

Last synced: 01 Sep 2025

https://github.com/jackgerrits/reductionml

Reduction-based machine learning framework with a focus on contextual bandits

contextual-bandits data-science machine-learning online-learning rust

Last synced: 10 Apr 2025

https://github.com/pottekkat/bulldozer-prize-predictions

Predict the auction sale price for a piece of heavy equipment to create a "blue book" for bulldozers.

bluebook bulldozer data data-science jupyter-notebook kaggle-competition machine-learning

Last synced: 20 Jun 2026

https://github.com/chris-santiago/steps

A SciKit-Learn style feature selector using best subsets and stepwise regression.

best-subset-selection data-science python scikit-learn stepwise-selection

Last synced: 28 Jun 2025

https://github.com/fmv1992/data_utilities

Data utilities library focused on machine learning and data analysis.

data-science utility-library

Last synced: 14 Mar 2025

https://github.com/alvarobartt/twipper

Python Twitter API Wrapper for both Free and Premium plans

api-wrapper data-science python twitter twitter-api

Last synced: 12 Apr 2025

https://github.com/jestonblu/data-science

Gitbook of data science and statistics tasks I want to remember

data-science gitbook r rmarkdown statistics

Last synced: 22 Jul 2025

https://github.com/curiousily/linear-regression-with-tensorflow-js

Build a Linear Regression model using TensorFlow.js and use to predict house prices

artificial-intelligence data-science javascript linear-regression machine-learning tensorflow tensorflowjs

Last synced: 01 Sep 2025

https://github.com/frobnitzem/mpi_list

A package for working with lists distributed over MPI

data-science hpc map-reduce mpi4py

Last synced: 18 Mar 2025

https://github.com/wpanas/ml-snippets

Code snippets for faster ML development

data-science hacktoberfest machine-learning pandas python seaborn snippets

Last synced: 07 Apr 2025

https://github.com/firelink-sh/evolve-py

A highly efficient, composable, and lightweight ETL and data integration framework.

analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3

Last synced: 10 Mar 2026

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 08 Jul 2025

https://github.com/spsanderson/steveondata

Repository for mainly R tips and tricks for my blog. I also include some VBA, SQL, C and Linux Usage.

ai blog c data data-science linux machinelearning-r ml ms-sql r sql time-series tipoftheday vba vba-excel

Last synced: 07 Apr 2025

https://github.com/cpeoples/powerpredict

๐Ÿ”ฎ AI-powered Powerball & Mega Millions lottery number prediction using deep learning (Transformer + LSTM), Markov chains, and statistical analysis. Built with TensorFlow/Keras 3.

artificial-intelligence data-science deep-learning keras lottery-prediction lstm machine-learning markov-chain megamillions neural-network powerball python scikit-learn statistical-analysis tensorflow texas-lottery transformer

Last synced: 21 May 2026

https://github.com/shreyamalogi/mushroom-mystery

Can You Eat That? ๐Ÿ„ Let Data Decide.

classification data-analysis data-science eda machine-learning python

Last synced: 06 Nov 2025

https://github.com/brendanhcullen/rstudio-instructor-certification

My materials for the RStudio Instructor Certification Teaching Exam.

certification data-science dplyr education instructor-training r rstudio tidyverse

Last synced: 25 Apr 2025

https://github.com/haloapping/cotomks

Kumpulan referensi untuk belajar mengenai pemrograman Python, Data Science, Machine Learning dan Deep Learning.

data-science deep-learning machine-learning model-evaluation probabilistic statistics time-series

Last synced: 04 Jul 2026

https://github.com/dhhruv/the-sparks-foundation-internship-tasks

This Repository is dedicated to the completion of my Task with video from The Sparks Foundation (Graduate Rotational Internship Program).

business-analytics data-science data-visualization grip gripmay21 intern internship jupyter machine-learning python task-6 task6 the-sparks-foundation thesparksfoundation tsf

Last synced: 19 Sep 2025

https://github.com/ndleah/tsf-data-science-internship

This repository contains the tasks performed during the Data Science and Business Analytics Internship at The Sparks Foundation

data-science data-visualization exploratory-data-analysis machine-learning powerbi python virtual-internship

Last synced: 20 Sep 2025

https://github.com/prakalp-pande/twitter-sentiment-analysis

Analyze public opinion on Tweet by mining and processing live Twitter data. Employ machine learning to classify tweets as positive, negative, or neutral, then visualize sentiment trends and identify key influencers.

data-science twitter-sentiment-analysis

Last synced: 18 Aug 2025

https://github.com/abhinav-ark/mal_lyrics_analysis

Preprocessing and EDA on a Dataset of Malayalam Songs and Lyrics

data-science eda jupyter-notebook python

Last synced: 22 Jul 2025

https://github.com/fusky-labs/pacopanda-drawing-stats

A case study and data analysis project that collects drawings from a furry artist Paco Panda

data-science data-visualization fastapi furries furry furry-fandom pandas python

Last synced: 09 Aug 2025

https://github.com/lockedata/opentrainingcontent

An MIT & CCBY4.0 licensed repository of training materials from Locke Data

data-science open-course r-stats

Last synced: 29 Jul 2025

https://github.com/RemiRigal/DatasetExplorer

A web tool for local dataset browsing and processing developped using the Flask + Angular stack.

ai angular data-processing data-science data-visualization dataset dataset-analysis docker docker-compose flask web-application

Last synced: 30 Jul 2025

https://github.com/snowflakedb/snowpark-checkpoints

Snowpark Python / Spark Migration Testing Tools

data-analytics data-engineering data-science python snowflake sql

Last synced: 31 Aug 2025

https://github.com/raoumer/dwx

Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases.

data-discovery data-science data-visualization machine-learning python

Last synced: 09 Mar 2026

https://github.com/onlyphantom/infratools

Kickstart Session: Infrastructure and Tools for Data Science workshop materials

data-science datascience workshop workshop-materials

Last synced: 14 Apr 2025

https://github.com/datadistillr/datadistillr-python-sdk

A Python SDK for Programmatically Interacting with DataDistillr

apache-drill data data-science datadistillr jupyter sql

Last synced: 01 Jul 2025

https://github.com/systemvll/hcaptcha-dataset-scraper

A simple nodejs script that return hcaptcha images and prompt for training AI.

ai data-science dataset hcaptcha hcaptcha-solver machine-learning

Last synced: 03 Aug 2025

https://github.com/andersy005/dask-notebooks

Dask tutorials for Big Data Analysis and Machine Learning as Jupyter notebooks

dask data-science distributed-computing jupyter-notebook parallel-computing python

Last synced: 30 Aug 2025

https://github.com/aiscalate/aiscalator

Tools to streamline Jupyter Notebook Prototypes into robust Data Products

airflow airflow-docker data-engineering data-science jupyter jupyter-notebook jupyterlab

Last synced: 26 Sep 2025

https://github.com/joaopfonseca/ml-research

A Python library with utilities for Machine Learning research and algorithm implementations

active-learning data-science machine-learning python scikit-learn

Last synced: 26 Oct 2025

https://github.com/ahmedukamel/instant-ai

This repository contains the diploma information, content, tasks, projects, and solutions.

artificial-intelligence data-science deep-learning machine-learning mathematics matplotlib numpy pandas python webscrapping

Last synced: 19 Oct 2025

https://github.com/feedzai/feedzai-openml-r

Implementations for Feedzai's OpenML APIs to allow for usage of machine learning models in the R programming language.

caret data-science feedzai machine-learning machine-learning-algorithms openml r rserve

Last synced: 17 Oct 2025

https://github.com/trafficgcn/optimal_path_dijkstra_for_data_science

Plotting the Optimal Route in Python for Data Scientists using the Dijkstra Algorithm

data-science dijkstra dijkstra-algorithm dijkstra-shortest-path map mapping open-street-map optimal-route osm osmnx python

Last synced: 27 Oct 2025

https://github.com/ngupta23/more

This is a helper package for pandas, visualizations and scikit-learn

data-science helpers pandas python scikit-learn visualization visualizations

Last synced: 24 Oct 2025

https://github.com/hughbe/facebook.net

A WIP Facebook Graph API client for .NET, used for the Facebook Civic Insights project by The Asia Foundation's Cambodia office

asia-foundation data-science facebook facebook-api facebook-graph-api

Last synced: 17 Mar 2026

https://github.com/emiruz/dataset-tools

Easy to use library for working with core.matrix datasets in Clojure: select, where, aggregate, join, order, cross-tab, from/to-dataset, etc

data-mining data-science dataset dsl matrix sql

Last synced: 22 Oct 2025

https://github.com/aglove2189/appias

Machine learning workflow toolkit โœจ๐Ÿฆ‹โœจ

appias data-analysis data-science machine-learning pandas python sklearn workflow

Last synced: 19 Oct 2025

https://github.com/hodgesmr/biden_nlp

Jupyter Notebook that introduces BIDEN: Binary Inference Dictionaries for Electoral NLP. It demonstrates a compression-based binary classification technique that is fast at both training and inference on common CPU hardware in Python

compression data-science machine-learning natural-language-processing nlp zstandard zstd

Last synced: 22 Jan 2026

https://github.com/raynardj/forgebox

The deep learning tool box

data-science machine-learning nlp pandas-dataframe

Last synced: 16 Oct 2025