Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2026-07-03 00:07:42 UTC
- JSON Representation
https://github.com/raynardj/langhuan
Light weight labeling engine
classification data-science labeling labeling-tool machine-learning named-entity-recognition ner nlp tagging-tool
Last synced: 16 Oct 2025
https://github.com/koalaverse/analyticssummit19
Material for 2019 Analytics Summit Machine Learning with R Training
data-science educational-materials machine-learning r workshop-materials
Last synced: 15 May 2025
https://github.com/dina-hosny/chaincare
ChainCare is a health information system that uses smart contracts to handle medical procedures and stores the medical history in Block Chains.
api-rest bigchain blockchain blockchain-technology data-science data-storage data-visualization ethereum golang health-informatics-systems healthcare insomnia metamask postgresql postman reactjs solidity truffle web3
Last synced: 13 Apr 2026
https://github.com/dalageo/ml-titanicshipwreck
Exploring the World's Most Renowned Shipwreck ๐ข
data-science decision-tree-classifier logistic-regression machine-learning python random-forest-classifier scikit-learn stacking-ensemble titanic-dataset xgboost-classifier
Last synced: 04 Sep 2025
https://github.com/lambdaclass/data_etudes
LambdaClass statistics, machine learning and data science etudes
data-science notebook probability statistics
Last synced: 09 Apr 2025
https://github.com/firaskahlaoui/heart-disease-analysis-r
R for data visualization and analysis of heart disease datasets.
data-science data-visualization ggplot kaggle-dataset r statistics
Last synced: 14 Apr 2025
https://github.com/klarna-incubator/mleko
Simplify and accelerate your machine learning development with mleko. Designed with modularity and customization in mind, it seamlessly integrates into your existing workflows. Its robust caching system optimizes performance, taking you from data ingestion to finalized models with unparalleled efficiency.
artificial-intelligence data-science machine-learning pipeline python vaex
Last synced: 11 Apr 2025
https://github.com/chandraprakash-bathula/apparel-recommendations
This project implements a personalized apparel recommendation engine using content-based search with the Amazon API, NLTK, and Keras libraries.
boxplot cnn-keras data-analysis data-science deep-learning linear-regression machine-learning numpy pandas scatter-plot scikit-learn svm tensorflow xgboost
Last synced: 23 Mar 2025
https://github.com/doubleml/doubleml-serverless
DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture
aws-lambda causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn serverless statistics
Last synced: 07 May 2025
https://github.com/rasmusrynell/predicting-nhl
The project explores the idea of using different machine learning techniques to determine different stats in NHL games.
ai algorithms data-science database machine-learning ml nhl nhl-api python scikit-learn sports sports-analytics sports-stats sportsanalytics
Last synced: 14 Apr 2025
https://github.com/bpw1621/streamlit-topic-modeling
Topic modeling streamlit app.
data-science latent-dirichlet-allocation machine-learning natural-language-processing nlp non-negative-matrix-factorization streamlit streamlit-application streamlit-sharing streamlit-webapp topic-modeling
Last synced: 27 Jul 2025
https://github.com/dogukanayd/catch-tweet-with-keyword
Get Tweet by giving keyword and do keyword analysis
data-analysis data-mining data-science datascience keyword-analysis python python27 social-media social-network social-network-analysis tweet tweets twitter twitter-analysis twitter-api twitter-oauth twitter-sentiment-analysis twitterwordcloud wordcloud
Last synced: 30 Aug 2025
https://github.com/mertguvencli/keyword-extractor
This project aims to find "what are the trending techs on Data Science jobs?" using NER.
data-science machine-learning ner nlp python spacy
Last synced: 10 Sep 2025
https://github.com/anaclumos/heart-diagnosis-engine
2019๋ ๋ฏผ์กฑ์ฌ๊ด๊ณ ๋ฑํ๊ต ์กธ์ ํ๋ก์ ํธ
data-science machine-learning pandas python scikit-learn
Last synced: 22 Aug 2025
https://github.com/strazto/mandrake
๐๐- Bring reading the manual ๐ closer to your drake ๐ workflow ๐ฅ
data-science drake high-performance-computing makefile pipeline r r-package reproducibility reproducible-research rstats workflow
Last synced: 13 Jul 2025
https://github.com/fozouni/data_science
Source codes of the first "Data Science Course"
artificial-intelligence data-science datascience deep-learning excel machine-learning python
Last synced: 04 Sep 2025
https://github.com/mratsim/meilleur-data-scientist-france-2018
My solution for the competition "Le meilleur data scientist de France 2018" (Best Data Scientist of France 2018)
data-science data-science-competition machine-learning xgboost
Last synced: 15 Sep 2025
https://github.com/sepandhaghighi/ethereum-fraud-detection-visualization
Ethereum Fraud Detection Visualization
data-analysis data-science data-visualization ethereum exploratory-data-analysis fraud fraud-detection machine-learning matplotlib python visualization
Last synced: 06 Sep 2025
https://github.com/the-akira/datascience
Coleรงรฃo de recursos sobre Ciรชncia de Dados com Python.
data data-analysis data-science data-structures data-visualization machine-learning machine-learning-algorithms mathematics pandas pandas-dataframe portuguese-language python3 scikit-learn statistics sympy
Last synced: 07 May 2025
https://github.com/arv-anshul/yt-watch-history
Analyse your YouTube watch history using Data Science, ML and NLP.
data-science docker docker-compose fastapi ml mlflow mlops mongodb nlp pydantic python3 streamlit youtube-api
Last synced: 22 Apr 2025
https://github.com/bradflaugher/ai-101
Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.
bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning
Last synced: 10 May 2025
https://github.com/dhimmel/openskistats
The study of skiing where we shred open data like pow. Quantifying alpine ski areas with geospatial metrics derived from OpenStreetMap.
data-science data-visualization downhill elevation geospatial gis mapping open-data openskimap openstreetmap orientation python quarto ski-areas skiing slope snowpack solar-irradiance sunlight topography
Last synced: 21 Jul 2025
https://github.com/hassaku/audio-plot
Python library to converts a line graph to sound and return an object that can be played in Jupyter notebook or Google Colab. Values are represented by pitches, and the timeline is represented by left and right pans. It was created to make data science fun for the visually impaired.
audio-plot colab data-science jupyter-notebook python visually-impaired
Last synced: 01 Nov 2025
https://github.com/lucadibello/it-salary-analysis
๐ฐ Analysis of Salaries in IT Roles: DevOps, Cyber Security, and AI
ai cybersecurity data-science devops jupyter-notebook salary-analysis
Last synced: 03 Jul 2025
https://github.com/devinterview-io/linear-algebra-interview-questions
๐ฃ Linear Algebra interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation linear-algebra linear-algebra-interview-questions linear-algebra-questions linear-algebra-tech-interview machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 07 Feb 2026
https://github.com/devinterview-io/svm-interview-questions
๐ฃ SVM interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview svm svm-interview-questions svm-questions svm-tech-interview technical-interview-questions
Last synced: 28 Jan 2026
https://github.com/mmore500/teeplot
organize data visualization output, automatically picking meaningful names based on semantic plotting variables
data-science data-visualization python python-package workflow
Last synced: 25 Feb 2026
https://github.com/adivarma27/pyab
Python package for Bayesian & Frequentist A/B Testing
ab-testing bayesian-statistics data-science frequentist-statistics hypothesis-testing marketing statistical-methods statistical-tests
Last synced: 14 Jan 2026
https://github.com/mrsaeeddev/ai-interview-questions
๐ค Real-World AI Interview Questions for You!
ai algorithms artificial-intelligence career data-science hacktoberfest hacktoberfest2020 interview interview-questions machine-learning resume
Last synced: 09 Aug 2025
https://github.com/VaibhavAbhimanyooHiwase/Risk_Calculation_using_Backward_Elimination_Algorithm_in_Life_Insurance
Implementation of backward elimination algorithm used for dimensionality reduction for improving the performance of risk calculation in life insurance industry.
alpha-value backward-elimination data-mining-algorithms data-science insurance kaggle-life-insurance life-insurance multiple-linear-regression p-value random-forest risk-analysis risk-assessment risk-calculations risk-modelling risk-models statistical-analysis statistical-data statistical-learning statistical-models statistics
Last synced: 29 Jul 2025
https://github.com/laminetourelab/tutorial
Tutorials on machine learning, artificial intelligence in general and in biomedical research.
artificial-intelligence bioinformatics bioinformatics-tutorials computer-vision data-science data-visualization-dashboard deep-learning graph-machine-learning image-analysis machine-learning natural-language-processing plotly-dash python pytorch scrna-seq shiny-apps tensorflow-tutorials transfer-learning tutorial-code tutorials
Last synced: 24 Oct 2025
https://github.com/rbhatia46/python-for-data-science
This repository contains iPython notebooks to get you started with sufficient amount of Python you need to learn to get started with your Data Science Journey.
data-science python-basics python3
Last synced: 03 Sep 2025
https://github.com/juniortorresmtj/projeto_deupositivo
Projeto de Anรกlise de Dados Abertos - SUS
alura bootcampds brazil data-science projeto python
Last synced: 29 Jul 2025
https://github.com/urbanclimatefr/coursera-learn-sql-basics-for-data-science
This repository contains the materials to "Learn SQL Basics for Data Science", a specialization provided by University of California, Davis through Coursera.
Last synced: 19 Feb 2026
https://github.com/anshumansinha3301/occupational-hazard-analysis
The Occupational Hazard Analysis Using Industry Data project aims to analyze safety metrics across various industries to identify trends in reported incidents, injuries, and fatalities.
consulting-services data-science industrialisation jupyter-notebook python
Last synced: 09 Oct 2025
https://github.com/nas5w/imdb-data
A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.
data data-science imdb javascript machine-learning
Last synced: 19 Apr 2025
https://github.com/kennethleungty/wikipedia-scraping-with-llm-agents
Scraping Wikipedia by combining LangChain's agents and tools with OpenAI's LLMs and function calling
artificial-intelligence data-analytics data-mining data-science deep-learning genai generative-ai langchain large-language-models llm machine-learning nlp openai openai-functions web-scraping wikipedia
Last synced: 12 Jul 2025
https://github.com/aicorsair/dataquest-data-science-analysis-projects
A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.
classification-models cluster-analysis data-analysis data-analytics data-cleaning data-preparation data-preprocessing data-science data-visualization deep-learning excel feature-engineering machine-learning pandas-dataframe power-bi python-3 regression-models scikit-learn sql web-scraping
Last synced: 27 Oct 2025
https://github.com/dionhaefner/fowd
Processing framework for FOWD, a free ocean wave dataset, ready for your ML application :ocean:
data-science machine-learning ocean open-data waves
Last synced: 21 Aug 2025
https://github.com/a-poor/flask-celery-ml
Handling long-running processes (like ML model predictions) inside a Flask app using Celery.
api celery data-science flask machine-learning python
Last synced: 03 Aug 2025
https://github.com/lironmiz/data.intro
Introductory course in the field of data science of the cyber education center at campus il which touches both the theoretical and the practical aspect of big data analysis in the Python language
big-data course data-analysis data-science data-visualization education jupyter-notebook learning-by-doing matplotlib numpy pandas-library python3 statistics
Last synced: 05 Jul 2025
https://github.com/arose13/rosey
Data science utilities for statistics and machine learning
data-science data-visualization keras machine-learning tensorflow
Last synced: 24 Oct 2025
https://github.com/virajbhutada/capstones
This repository contains all the necessary files and documentation for a detailed analysis of bank loan data using a combination of SQL, Power BI, Excel, and Tableau. The project aims to uncover insights related to loan applications, funding, repayments, and borrower demographics, facilitating data-driven decision-making in the banking sector.
bank-loan-analysis dashboard data-science dax-query eda excel excel-dashboard excel-functions mssql-server powerbi powerbi-reports powerbi-visuals sql sql-database tableau tableau-public tableau-server
Last synced: 30 Oct 2025
https://github.com/joshuaulrich/stl-rug
Content presented at the Saint Louis R User Group
Last synced: 26 Aug 2025
https://github.com/liamarguedas/uber-eats-delivery-time
Delivery time prediction system for Uber Eats
data-science machine-learning regression
Last synced: 10 Oct 2025
https://github.com/JRaviLab/compbio-gists
Computational Biology & Bioinformatics Resources
bioinformatics comparative-genomics computational-biology data-science gists molecular-evolution phylogeny r shell transcriptomics
Last synced: 07 Oct 2025
https://github.com/mafda/knee_oa_dl_app
Web app to predict knee osteoarthritis grade using Deep Learning and Streamlit
convolutional-neural-networks data-science deep-neural-networks knee-osteoarthritis knee-osteoarthritis-analysis ml-app ml-application streamlit x-ray-images
Last synced: 25 Oct 2025
https://github.com/jdiaz97/iucnredlist.jl
API Wrapper for the IUCN Red List.
biodiversity data-science ecology
Last synced: 21 Oct 2025
https://github.com/supercowpowers/scp-labs
SCP Labs (Open Source Team for SuperCowPowers)
data-analysis data-science pandas python scikit-learn security
Last synced: 06 May 2025
https://github.com/ihmeuw/easylink
A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.
data-science entity-resolution record-linkage
Last synced: 01 Apr 2026
https://github.com/cdcgov/cdh-lava-react
CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.
agile-development azure data-analysis data-catalog data-governance data-quality data-science data-visualization databricks datavisualization devops excel-export metadata operations powerautomate powerbi pyspark security sql test-automation
Last synced: 22 Apr 2025
https://github.com/quantifyearth/yirgacheffe
A declarative geospatial library for Python to make data-science with maps easier
data-science geospatial python3
Last synced: 01 Apr 2026
https://github.com/nikhilba/aerial-imagery
Data Science Research Project: Map poverty using satellite images.
carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16
Last synced: 28 Oct 2025
https://github.com/eshikashah/skillship-internship-project-1-prediction-of-a-patient-s-no_show-appointments
Skillship Foundation internship project.
classification data-processing data-science machine-learning python
Last synced: 21 Jul 2025
https://github.com/adilshamim8/100-ai-machine-learning-deep-learnin-projects
100 AI Machine Learning Deep Learning Projects is a curated repository showcasing innovative, production-ready solutions across computer vision, NLP, and more.
ai artificial-intelligence computer-vision computer-vision-projects data-science deep-learning deep-learning-projects machine-learning machine-learning-projects nlp nlp-projects python
Last synced: 20 Apr 2026
https://github.com/teddyoweh/sentiment-analysis-api
The Sentiment Analysis Api was created using python flask module,it allows users to parse a text or sentence throught the (?text) arguement, then view the sentiment analysis of that sentence. It can be implementable into a web application.
api data-science flask machine-learning nlp-machine-learning php python sentiment-analysis
Last synced: 09 Apr 2025
https://github.com/aiguofer/sql_connectors
A simple wrapper for SQL connections using SQLAlchemy and Pandas read_sql to standardize SQL workflow with multiple data sources.
data-analysis data-analytics data-exploration data-science pandas relational-databases sql sqlalchemy standardized-api
Last synced: 13 Oct 2025
https://github.com/pitmonticone/reddittextclassification
Reddit Gender Text-Classification.
algorithms artificial-intelligence computer-science data-analysis data-mining data-science data-visualization jupyter-notebook keras-tensorflow language-model machine-learning modeling natural-language-processing neural-network nlp python reddit scikit-learn spacy-nlp tensorflow
Last synced: 24 Oct 2025
https://github.com/bluegreen-labs/appeears
Interface to the NASA AppEEARS API
api data-science r-package remote-sensing rstats
Last synced: 23 Aug 2025
https://github.com/fchamroukhi/samurais
StAtistical Models for the UnsupeRvised segmentAion of tIme-Series
artificial-intelligence change-point-detection data-science dynamic-programming em-algorithm hidden-markov-models hidden-process-regression human-activity-recognition latent-variable-models model-selection multivariate-timeseries newton-raphson piecewise-regression statistical-inference statistical-learning time-series-analysis time-series-clustering
Last synced: 22 Oct 2025
https://github.com/gregyjames/insidebarscanner
Scan every stock listed on the Nasdaq to find those with daily inside bars for trading,
data-science investment pandas-dataframe python3 scanner stock-market stocks yfinance yfinance-api
Last synced: 25 Apr 2025
https://github.com/fwd/reddit
Graph Visualization UI for Reddit.
data data-science datasets worldnews
Last synced: 24 Apr 2025
https://github.com/nicodupont/resources
Resources on SAS, Python, SQL, VBA-Excel, etc ...
airflow data-science data-visualization excel python r sas sql vba
Last synced: 24 Jun 2025
https://github.com/maayanlab/playbook-workflow-builder
A repository for the Playbook Workflow Builder project.
bioinformatics biology cwl data-science gene-expression gene-ontology gene-sets proteomics rna-seq-analysis systems-biology workflow
Last synced: 11 Jul 2025
https://github.com/opt-nc/setup-duckdb-action
๐ฆ Blazing Fast and highly customizable Github Action to setup a DuckDb runtime
action actions analytics csv data-science database databases dataquality dataqualitycheck duckdb embedded-database github-actions olap sql
Last synced: 16 Mar 2026
https://github.com/cadcad-org/snippets
Repo containing notebooks showcasing features and applications of cadCAD.
cadcad data-science education python simulation snippets
Last synced: 23 Apr 2025
https://github.com/desdaemon/polars_dart
Dart bindings for the polars library
apache-arrow dart data-science ffi flutter flutter-rust-bridge polars rust
Last synced: 19 Apr 2025
https://github.com/nikhilaravi/neuralnetflix
Movie Genre Prediction from movie posters using Deep Learning
Last synced: 18 Oct 2025
https://github.com/teddyoweh/dimensionality-reduction-pca
Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.
data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning
Last synced: 09 Apr 2025
https://github.com/mrtkp9993/anomalydetectioncpp
Simple anomaly detection for univariate time series data.
anomaly-detection cpp data-science statistics
Last synced: 24 Oct 2025
https://github.com/gianlucatruda/warfit-learn
A machine learning toolkit for reproducible research in anticoagulant dose estimation.
data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn
Last synced: 24 Oct 2025
https://github.com/apear9/riskmapr
Code for riskmapr apps for invasive weed risk mapping
bayesian bayesian-network data-science ecology ecology-of-invasion invasive-species risk-map shiny shiny-apps weeds
Last synced: 30 Jul 2025
https://github.com/paritoshtripathi935/product-matching
The topic is about product matching via Machine Learning. This involves using various machine learning techniques such as natural language processing, image recognition, and collaborative filtering algorithms to match similar products together.
amazon-scraper collaborative-filtering data-science django flipkart-scraper-python langchain machine-learning nlp opencv product-matching python
Last synced: 08 Jul 2025
https://github.com/devinterview-io/naive-bayes-interview-questions
๐ฃ Naive Bayes interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions naive-bayes naive-bayes-interview-questions naive-bayes-questions naive-bayes-tech-interview software-engineer-interview technical-interview-questions
Last synced: 23 Feb 2026
https://github.com/juliaai/mljflow.jl
Connecting MLJ and MLFlow
data-science julia machine-learning machine-learning-operations machine-learning-ops mlflow mlj mlops statistics
Last synced: 25 Oct 2025
https://github.com/buccaneerai/rxjs-stats
Moved to @bottlenose/rxstats (https://github.com/buccaneerai/bottlenose)
analytics data data-mining data-science observables reactive rxjs statistics
Last synced: 15 Jul 2025
https://github.com/synthesized-io/insight
๐งฟ Metrics & Monitoring of Datasets
data data-analysis data-science framework insights metrics monitoring python
Last synced: 24 Jun 2025
https://github.com/codewithmuh/insatgram-ai-model
Create high-quality images effortlessly for your brand using Fooocus, an advanced image generation software.
ai ai-models artificial-intelligence chatgpt data-science generative-ai-model generative-ai-tools generative-model instagram machine-learning models text-to-image
Last synced: 10 Apr 2025
https://github.com/fabiosmuu/rna
Este repositรณrio tem como intuito, demonstrar um modulo de redes neurais que venho desenvolvendo.
algorithms data-science ia inteligencia-artificial redes-neurais-artificiais rna
Last synced: 10 Apr 2025
https://github.com/aniketpatilanalyst/Disease-Prediction-Model
Prediction Model on Cell Images for Detecting Malaria
artificial-intelligence cnn-classification data-science deep-neural-networks disease-prediction image-processing
Last synced: 10 Mar 2025
https://github.com/simranjeet97/top-machine-learning-algorithms-python
This Repository contains the Machine Learning Algorithms with Mathematical Explanation behind them along with Implementation in Python.
data data-analysis data-science data-structures database machine machine-learning machine-learning-algorithms machine-learning-library machine-learning-playlist machinelearning machinelearning-python python python-programming python-script python3 youtube youtube-tutorial youtube-tutorial-series
Last synced: 11 Apr 2025
https://github.com/ashwinpn/applied-data-science-with-python-specialization-university-of-michigan
Applied Data Science with Python Specialization: University of Michigan
coursera coursera-assignment coursera-data-science coursera-machine-learning coursera-python coursera-specialization data-science machine-learning university-of-michigan
Last synced: 13 Apr 2025
https://github.com/gabrieldim/calculation-cholesterol-data-science
Cholesterol is calculated from the given set of data.
convolutional-layers data-science dense layer
Last synced: 07 Jul 2025
https://github.com/garciparedes/python-examples
Set of awesome Python Examples
data-science examples exercises math numpy pandas python python-3 tensorflow
Last synced: 13 Apr 2025
https://github.com/sithu-khant/math-for-ml-ds
Mathematics learning path for Machine Learning and Data Science.
awesome-list data-science deep-learning machine-learning mathematics
Last synced: 13 Apr 2025
https://github.com/sahahn/bpt
The Brain Predictability toolbox (BPt), is a python based Machine Learning library designed primarily for tabular and neuroimaging specific neuroimaging data but can easily be generalized further.
bp bpt brain-predictability-toolbox data-analysis data-science machine-learning ml neuroimaging-data neuroscience neuroscience-methods pandas python sklearn
Last synced: 13 Apr 2025
https://github.com/vatshayan/image-recognition-project
Beautiful Image recognition and Classification Project for final year college students.
btech-project college-project collegeprojects cse-project data-science final final-project final-year-project finalyearproject image image-classification image-processing image-recognition image-recognition-algorithms keras keras-neural keras-neural-networks mtech-project
Last synced: 28 Oct 2025
https://github.com/tslu1s/atlantic
Atlantic: Automated Data Preprocessing Framework for Supervised Machine Learning
automation automl automl-pipeline data-preprocessing data-science feature-selection label-encoder machine-learning onehot-encoder predictive-maintenance predictive-modeling preprocessing-pipeline python scikit-learn
Last synced: 10 Apr 2025
https://github.com/vianneymi/baker
Project demonstrating a TDS article about structuring unstructured data using LLMs
data-engineering data-mining data-science langchain llm mistralai pydantic
Last synced: 11 Jul 2025
https://github.com/yevh/anonymizer
Anonymize sensitive data in your datasets.
anonymize anonymized anonymizer crypto cryptography data-anonymization data-anonymized data-science data-security dataset datasets datasets-csv datasets-preparation python python3 security sensitive sensitive-data
Last synced: 07 Jul 2025
https://github.com/zohaib58/gdsc-dsx2022
Google Developers Student Club - Data Science Bootcamp 2022
Last synced: 05 May 2025
https://github.com/torkamanilab/zoish
Zoish is a Python package that streamlines machine learning by leveraging SHAP values for feature selection and interpretability, making model development more efficient and user-friendly
automl data-science feature-engineering feature-selection machine-learning python scikit-learn
Last synced: 10 Apr 2025
https://github.com/millengustavo/demo-datasus-streamlit
Demo Application with DataSUS death records and Streamlit
data-science datasus health healthcare streamlit
Last synced: 10 Apr 2025
https://github.com/rpoteau/pyphyschem
Python in the physical chemistry lab
chemistry data-science jupyter machine-learning physical-chemistry python sympy
Last synced: 05 Apr 2026
https://github.com/amirhosseinhonardoust/workout-efficiency-benchmark
Streamlit + Python pipeline that benchmarks gym workout efficiency (kcal/min) using present sessions only. Generates sortable workout-type benchmarks, distribution plots, fairness-aware gap analysis with uncertainty/low-sample flags, and a data-quality report to prevent misleading comparisons.
analytics benchmarking bias-audit dashboard data-analysis data-quality data-science eda fairness fitness health-data pandas plotly python reporting reproducible-research statistics streamlit visualization workout
Last synced: 10 Jun 2026
https://github.com/zgornel/datalinter
Linting tools for ML workflows, data, code
code-analysis-tool coding-agent data-science linting
Last synced: 21 Apr 2026
https://github.com/xability/py-maidr
Python binder for maidr library
accessibility binder braille data-science data-visualization python
Last synced: 03 Apr 2026
https://github.com/tkonopka/rcssplot
R plots styled with css
css data-science r visualization
Last synced: 22 Oct 2025
https://github.com/devinterview-io/optimization-interview-questions
๐ฃ Optimization interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions optimization optimization-interview-questions optimization-questions optimization-tech-interview software-engineer-interview technical-interview-questions
Last synced: 30 Jan 2026