Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2026-07-03 00:07:42 UTC
- JSON Representation
https://github.com/devinterview-io/llmops-interview-questions
๐ฃ LLMOps interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation llmops llmops-interview-questions llmops-questions llmops-tech-interview machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions software-engineer-interview technical-interview-questions
Last synced: 16 Feb 2026
https://github.com/thomasnield/oreilly_kotlin_for_data_science
Notes, slides, and contents for the O'Reilly videos using Kotlin for Data Science
data-engineering data-science etl kotlin oreilly statistics
Last synced: 27 Mar 2025
https://github.com/l480/rewe-price-data
๐ช Daily updated prices of all items from the German supermarket chain REWE as CSV (including EAN, grammage, product image etc.)
csv data-science ean inflation prices rewe shrinkflation supermarket
Last synced: 11 Jan 2026
https://github.com/yangfa-zhang/lunax
Lunax is a machine learning framework specifically designed for the processing and analysis of tabular data.
data-analysis data-science lunax machine-learning tabular-data
Last synced: 14 Dec 2025
https://github.com/florents-tselai/sqlite-for-data-scientists
Notebooks and supporting files for SQLite for Data Scientists Online Live Training, on OReilly Learning Platform
data-science learning sql sqlite3 training-materials
Last synced: 11 Apr 2025
https://github.com/dhhruv/stock-price-prediction
A deep learning project in which the model was trained using LSTM layers and Tata Stock prices were predicted and compared with thier actual values.
algorithm cli college-project data data-science dataset deep-learning jupyter jupyter-notebook lstm machine-learning prediction science shell stock-price-prediction tata-beverages terminal
Last synced: 03 May 2025
https://github.com/klarna-incubator/mleko
Simplify and accelerate your machine learning development with mleko. Designed with modularity and customization in mind, it seamlessly integrates into your existing workflows. Its robust caching system optimizes performance, taking you from data ingestion to finalized models with unparalleled efficiency.
artificial-intelligence data-science machine-learning pipeline python vaex
Last synced: 11 Apr 2025
https://github.com/ttitcombe/constituencymap
Python code to generate political maps
brexit choropleth choropleth-map data-science election-data map political-science politics united-kingdom visualization
Last synced: 11 Apr 2025
https://github.com/edaaydinea/op1-prediction-of-the-different-progressive-levels-of-alzheimer-s-disease
This is an optional model development project on a real dataset related to predicting the different progressive levels of Alzheimerโs disease (AD).
alzheimer-disease-prediction anova-test catboost-classifier chi-square-test data-science deep-neural-networks keras-neural-networks lightgbm-classifier logistic-regression machine-learning multi-layer-perceptron-classifier neural-networks random-forest-classifier tensorflow xgboost-classifier
Last synced: 11 Apr 2025
https://github.com/jeonghunyoon/machine-learning-lecture-notes
Lecture notes and codes for machine learning
data-science decision-tree deep-learning lecture-notes linear-algebra linear-regression lsa machine-learning naive-bayes-classifier statistics
Last synced: 10 Apr 2025
https://github.com/hourout/linora
Simple and efficient tools for data science.
data-analysis data-mining data-science hyperparameter-optimization lightgbm machine-learning python xgboost
Last synced: 04 Apr 2025
https://github.com/alexioannides/notes-and-demos
Study notes and demos.
data-engineering data-science ml-engineering mlops python
Last synced: 29 Oct 2025
https://github.com/alro10/twitter-sentiment-live
Sentiment analysis for tweets written in Portuguese-Brazil
dash dash-app dash-plotly dashboards data-science plotly portuguese-brazilian python3 sentiment-analysis tweepy tweets vader-sentiment-analysis
Last synced: 17 Jun 2025
https://github.com/rbhatia46/data-preprocessing-template
This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.
data-preprocessing data-science machine-learning python
Last synced: 11 Apr 2025
https://github.com/hsins/mpl-tc-fonts
๐น๐ผ A package to solve the problem of "Tofu" in your matplotlib plots whenever you're trying to use Traditional Chinese characters in labels or texts.
cjk-characters data-science matplotlib
Last synced: 29 Oct 2025
https://github.com/bcgov/canwqdata
R ๐ฆ to download ๐จ๐ฆ open water quality data
data-science env r r-package rlang rstats
Last synced: 20 Jul 2025
https://github.com/kurtispykes/twitter-sentiment-analysis
Creating a Gradio user interface to predict the sentiment of a tweet
data-science deep-learning gradio keras lstm machine-learning natural-language-processing neural-network nlp nlp-machine-learning prediction python sentiment-analysis tweet twitter
Last synced: 03 May 2025
https://github.com/luminousmen/python_for_ds
Python for Data Analysis workshop
data-analysis data-science python tutorial
Last synced: 01 May 2025
https://github.com/thecoderpinar/spotify_trends_2023_analysis
Exploring Spotify's latest trends, top songs, genres, and artists using Python, Pandas, NumPy, Matplotlib, CNNs for image-based analysis, and advanced algorithms for music recommendation. Dive into the world of music data and discover what's trending on Spotify! ๐ต๐
cnn cnn-keras data-analysis data-science data-visualization machine-learning matplotlib music-trend numpy pandas python spotify
Last synced: 30 Apr 2025
https://github.com/ptyadana/tableau_2020_a-z_hands-on
Tableau Projects for data analysis, data analytics and data visualaization on different data sets
data-analysis data-science data-visualization tableau tableau-dashboards tableau-desktop tableau-public tableau-workbooks
Last synced: 03 Aug 2025
https://github.com/doubleml/doubleml-serverless
DoubleML-Serverless - Distributed Double Machine Learning with a Serverless Architecture
aws-lambda causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn serverless statistics
Last synced: 07 May 2025
https://github.com/the-akira/datascience
Coleรงรฃo de recursos sobre Ciรชncia de Dados com Python.
data data-analysis data-science data-structures data-visualization machine-learning machine-learning-algorithms mathematics pandas pandas-dataframe portuguese-language python3 scikit-learn statistics sympy
Last synced: 07 May 2025
https://github.com/cimentadaj/dataharvesting
Material for the course 'Data Harvesting' for the masters in computational social science - UC3M
api data-science r web-scraping
Last synced: 30 Apr 2025
https://github.com/rasmusrynell/predicting-nhl
The project explores the idea of using different machine learning techniques to determine different stats in NHL games.
ai algorithms data-science database machine-learning ml nhl nhl-api python scikit-learn sports sports-analytics sports-stats sportsanalytics
Last synced: 14 Apr 2025
https://github.com/dhimmel/openskistats
The study of skiing where we shred open data like pow. Quantifying alpine ski areas with geospatial metrics derived from OpenStreetMap.
data-science data-visualization downhill elevation geospatial gis mapping open-data openskimap openstreetmap orientation python quarto ski-areas skiing slope snowpack solar-irradiance sunlight topography
Last synced: 21 Jul 2025
https://github.com/laminetourelab/tutorial
Tutorials on machine learning, artificial intelligence in general and in biomedical research.
artificial-intelligence bioinformatics bioinformatics-tutorials computer-vision data-science data-visualization-dashboard deep-learning graph-machine-learning image-analysis machine-learning natural-language-processing plotly-dash python pytorch scrna-seq shiny-apps tensorflow-tutorials transfer-learning tutorial-code tutorials
Last synced: 24 Oct 2025
https://github.com/bradflaugher/ai-101
Notes, links and code samples and resources for teaching yourself pytorch and tensorflow.
bootcamp course data-engineering data-science learn-to-code learning-by-doing learning-python machine-learning
Last synced: 10 May 2025
https://github.com/anaclumos/heart-diagnosis-engine
2019๋ ๋ฏผ์กฑ์ฌ๊ด๊ณ ๋ฑํ๊ต ์กธ์ ํ๋ก์ ํธ
data-science machine-learning pandas python scikit-learn
Last synced: 22 Aug 2025
https://github.com/networks-learning/discussion-complexity
Code for "On the Complexity of Opinions and Online Discussions", WSDM 2019
complexity data-science discussion online-discussions opinion-mining paper wsdm
Last synced: 10 Aug 2025
https://github.com/mratsim/meilleur-data-scientist-france-2018
My solution for the competition "Le meilleur data scientist de France 2018" (Best Data Scientist of France 2018)
data-science data-science-competition machine-learning xgboost
Last synced: 15 Sep 2025
https://github.com/fabriziomusacchio/python_neuro_practical
This is the course material for the advanced course into Python for Data Scientists.
data-analysis data-science jupyter jupyter-notebook jupyter-notebooks open-source python teaching teaching-materials
Last synced: 22 Jul 2025
https://github.com/hassaku/audio-plot
Python library to converts a line graph to sound and return an object that can be played in Jupyter notebook or Google Colab. Values are represented by pitches, and the timeline is represented by left and right pans. It was created to make data science fun for the visually impaired.
audio-plot colab data-science jupyter-notebook python visually-impaired
Last synced: 01 Nov 2025
https://github.com/firaskahlaoui/heart-disease-analysis-r
R for data visualization and analysis of heart disease datasets.
data-science data-visualization ggplot kaggle-dataset r statistics
Last synced: 14 Apr 2025
https://github.com/dogukanayd/catch-tweet-with-keyword
Get Tweet by giving keyword and do keyword analysis
data-analysis data-mining data-science datascience keyword-analysis python python27 social-media social-network social-network-analysis tweet tweets twitter twitter-analysis twitter-api twitter-oauth twitter-sentiment-analysis twitterwordcloud wordcloud
Last synced: 30 Aug 2025
https://github.com/ndxdeveloper/formation-python
Formation Python - Du dรฉbutant ร l'avancรฉ | 13 modules (FastAPI, Type Hints, Data Science, SQLAlchemy, asyncio) | 75+ sujets | 100% franรงais | MIT License
api-rest asyncio data-science developpement fastapi formation francais french learning numpy pandas poetry poo programmation pytest python python3 sqlalchemy type-hints
Last synced: 08 Apr 2026
https://github.com/juniortorresmtj/projeto_deupositivo
Projeto de Anรกlise de Dados Abertos - SUS
alura bootcampds brazil data-science projeto python
Last synced: 29 Jul 2025
https://github.com/MCodrescu/octopus
R Package for Interacting with Databases
data-science database r rshiny
Last synced: 29 Jul 2025
https://github.com/adilshamim8/100-ai-machine-learning-deep-learnin-projects
100 AI Machine Learning Deep Learning Projects is a curated repository showcasing innovative, production-ready solutions across computer vision, NLP, and more.
ai artificial-intelligence computer-vision computer-vision-projects data-science deep-learning deep-learning-projects machine-learning machine-learning-projects nlp nlp-projects python
Last synced: 20 Apr 2026
https://github.com/durgeshsamariya/100daysofdatascience
A 100 Day DS Challenge to learn and implement DS concepts ranging from the beginner of Data Science to Data Scientist.
100days 100daysofcode 100daysofdscode 100daysofmlcode data data-science
Last synced: 15 Apr 2025
https://github.com/ammarlodhi255/student_performance_indicator_end-to-end_implementation
An end-to-end machine learning project, student performance indicator. The goal of this project is to understand the influence of the parents background, test preparation, and various other variables on the students performance.
aws cd-pipeline data-analysis data-science data-science-projects eda end-to-end-machine-learning machine-learning machine-learning-projects regression regression-analysis
Last synced: 27 Sep 2025
https://github.com/eshikashah/skillship-internship-project-1-prediction-of-a-patient-s-no_show-appointments
Skillship Foundation internship project.
classification data-processing data-science machine-learning python
Last synced: 21 Jul 2025
https://github.com/quant-aq/aeromancy
โ๏ธ Aeromancy: A framework for performing reproducible AI and ML
aeromancy data-science machine-learning reproducibility reproducible reproducible-experiments reproducible-research reproducible-science
Last synced: 24 Dec 2025
https://github.com/bluegreen-labs/appeears
Interface to the NASA AppEEARS API
api data-science r-package remote-sensing rstats
Last synced: 23 Aug 2025
https://github.com/tezansahu/dvc-pycaret-fastapi-demo
Repository for the Demo of using DVC with PyCaret & MLOps (DVC Office Hours - 20th Jan, 2022)
data-science demo deployment dvc fastapi machine-learning mlops-workflow pycaret
Last synced: 26 Dec 2025
https://github.com/aniketpatilanalyst/Disease-Prediction-Model
Prediction Model on Cell Images for Detecting Malaria
artificial-intelligence cnn-classification data-science deep-neural-networks disease-prediction image-processing
Last synced: 10 Mar 2025
https://github.com/fwd/reddit
Graph Visualization UI for Reddit.
data data-science datasets worldnews
Last synced: 24 Apr 2025
https://github.com/synthesized-io/insight
๐งฟ Metrics & Monitoring of Datasets
data data-analysis data-science framework insights metrics monitoring python
Last synced: 24 Jun 2025
https://github.com/fabiosmuu/rna
Este repositรณrio tem como intuito, demonstrar um modulo de redes neurais que venho desenvolvendo.
algorithms data-science ia inteligencia-artificial redes-neurais-artificiais rna
Last synced: 10 Apr 2025
https://github.com/buccaneerai/rxjs-stats
Moved to @bottlenose/rxstats (https://github.com/buccaneerai/bottlenose)
analytics data data-mining data-science observables reactive rxjs statistics
Last synced: 15 Jul 2025
https://github.com/garciparedes/python-examples
Set of awesome Python Examples
data-science examples exercises math numpy pandas python python-3 tensorflow
Last synced: 13 Apr 2025
https://github.com/codewithmuh/insatgram-ai-model
Create high-quality images effortlessly for your brand using Fooocus, an advanced image generation software.
ai ai-models artificial-intelligence chatgpt data-science generative-ai-model generative-ai-tools generative-model instagram machine-learning models text-to-image
Last synced: 10 Apr 2025
https://github.com/gabrieldim/calculation-cholesterol-data-science
Cholesterol is calculated from the given set of data.
convolutional-layers data-science dense layer
Last synced: 07 Jul 2025
https://github.com/tslu1s/atlantic
Atlantic: Automated Data Preprocessing Framework for Supervised Machine Learning
automation automl automl-pipeline data-preprocessing data-science feature-selection label-encoder machine-learning onehot-encoder predictive-maintenance predictive-modeling preprocessing-pipeline python scikit-learn
Last synced: 10 Apr 2025
https://github.com/sahahn/bpt
The Brain Predictability toolbox (BPt), is a python based Machine Learning library designed primarily for tabular and neuroimaging specific neuroimaging data but can easily be generalized further.
bp bpt brain-predictability-toolbox data-analysis data-science machine-learning ml neuroimaging-data neuroscience neuroscience-methods pandas python sklearn
Last synced: 13 Apr 2025
https://github.com/Himscipy/bnn_hvd
Distributed Training of Bayesian Neural Networks at Scale
bayesian-networks computer-vision data-science distributed-computing horovod machine-learning mnist tensorflow tensorflow-probability uncertainty-quantification variational-inference
Last synced: 15 Jul 2025
https://github.com/mikeroyal/apache-ignite-guide
Apache Ignite Guide
data-science database hadoop hadoop-cluster ignite nosql nosql-data-storage nosql-databases stream-processing streaming
Last synced: 06 May 2025
https://github.com/vianneymi/baker
Project demonstrating a TDS article about structuring unstructured data using LLMs
data-engineering data-mining data-science langchain llm mistralai pydantic
Last synced: 11 Jul 2025
https://github.com/zohaib58/gdsc-dsx2022
Google Developers Student Club - Data Science Bootcamp 2022
Last synced: 05 May 2025
https://github.com/millengustavo/demo-datasus-streamlit
Demo Application with DataSUS death records and Streamlit
data-science datasus health healthcare streamlit
Last synced: 10 Apr 2025
https://github.com/virajbhutada/capstones
This repository contains all the necessary files and documentation for a detailed analysis of bank loan data using a combination of SQL, Power BI, Excel, and Tableau. The project aims to uncover insights related to loan applications, funding, repayments, and borrower demographics, facilitating data-driven decision-making in the banking sector.
bank-loan-analysis dashboard data-science dax-query eda excel excel-dashboard excel-functions mssql-server powerbi powerbi-reports powerbi-visuals sql sql-database tableau tableau-public tableau-server
Last synced: 30 Oct 2025
https://github.com/yevh/anonymizer
Anonymize sensitive data in your datasets.
anonymize anonymized anonymizer crypto cryptography data-anonymization data-anonymized data-science data-security dataset datasets datasets-csv datasets-preparation python python3 security sensitive sensitive-data
Last synced: 07 Jul 2025
https://github.com/LukasHedegaard/datasetops
Fluent dataset operations, compatible with your favorite libraries
data-cleaning data-munging data-processing data-science data-wrangling dataset dataset-combinations deep-learning multiple-datasets pytorch tensorflow
Last synced: 08 May 2025
https://github.com/juliaai/mljflow.jl
Connecting MLJ and MLFlow
data-science julia machine-learning machine-learning-operations machine-learning-ops mlflow mlj mlops statistics
Last synced: 25 Oct 2025
https://github.com/ashwinpn/applied-data-science-with-python-specialization-university-of-michigan
Applied Data Science with Python Specialization: University of Michigan
coursera coursera-assignment coursera-data-science coursera-machine-learning coursera-python coursera-specialization data-science machine-learning university-of-michigan
Last synced: 13 Apr 2025
https://github.com/vatshayan/image-recognition-project
Beautiful Image recognition and Classification Project for final year college students.
btech-project college-project collegeprojects cse-project data-science final final-project final-year-project finalyearproject image image-classification image-processing image-recognition image-recognition-algorithms keras keras-neural keras-neural-networks mtech-project
Last synced: 28 Oct 2025
https://github.com/sithu-khant/math-for-ml-ds
Mathematics learning path for Machine Learning and Data Science.
awesome-list data-science deep-learning machine-learning mathematics
Last synced: 13 Apr 2025
https://github.com/simranjeet97/top-machine-learning-algorithms-python
This Repository contains the Machine Learning Algorithms with Mathematical Explanation behind them along with Implementation in Python.
data data-analysis data-science data-structures database machine machine-learning machine-learning-algorithms machine-learning-library machine-learning-playlist machinelearning machinelearning-python python python-programming python-script python3 youtube youtube-tutorial youtube-tutorial-series
Last synced: 11 Apr 2025
https://github.com/torkamanilab/zoish
Zoish is a Python package that streamlines machine learning by leveraging SHAP values for feature selection and interpretability, making model development more efficient and user-friendly
automl data-science feature-engineering feature-selection machine-learning python scikit-learn
Last synced: 10 Apr 2025
https://github.com/tkonopka/rcssplot
R plots styled with css
css data-science r visualization
Last synced: 22 Oct 2025
https://github.com/devinterview-io/optimization-interview-questions
๐ฃ Optimization interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions optimization optimization-interview-questions optimization-questions optimization-tech-interview software-engineer-interview technical-interview-questions
Last synced: 30 Jan 2026
https://github.com/fffaraz/datasets
My collection of random datasets
data-mining data-science dataset
Last synced: 04 Sep 2025
https://github.com/aicorsair/dataquest-data-science-analysis-projects
A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.
classification-models cluster-analysis data-analysis data-analytics data-cleaning data-preparation data-preprocessing data-science data-visualization deep-learning excel feature-engineering machine-learning pandas-dataframe power-bi python-3 regression-models scikit-learn sql web-scraping
Last synced: 27 Oct 2025
https://github.com/desdaemon/polars_dart
Dart bindings for the polars library
apache-arrow dart data-science ffi flutter flutter-rust-bridge polars rust
Last synced: 19 Apr 2025
https://github.com/nicodupont/resources
Resources on SAS, Python, SQL, VBA-Excel, etc ...
airflow data-science data-visualization excel python r sas sql vba
Last synced: 24 Jun 2025
https://github.com/kennethleungty/langextract-gemma-structured-extraction
Using LangExtract and Gemma 3 for structured information extraction from unstructured text in insurance polices
artificial-intelligence data-science deep-learning gemini gemma gemma3-4b google langextract large-language-models llm llms machine-learning openai structured-data unstructured-data
Last synced: 03 Sep 2025
https://github.com/mafda/knee_oa_dl_app
Web app to predict knee osteoarthritis grade using Deep Learning and Streamlit
convolutional-neural-networks data-science deep-neural-networks knee-osteoarthritis knee-osteoarthritis-analysis ml-app ml-application streamlit x-ray-images
Last synced: 25 Oct 2025
https://github.com/teddyoweh/dimensionality-reduction-pca
Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.
data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning
Last synced: 09 Apr 2025
https://github.com/nikhilaravi/neuralnetflix
Movie Genre Prediction from movie posters using Deep Learning
Last synced: 18 Oct 2025
https://github.com/mrtkp9993/anomalydetectioncpp
Simple anomaly detection for univariate time series data.
anomaly-detection cpp data-science statistics
Last synced: 24 Oct 2025
https://github.com/xability/py-maidr
Python binder for maidr library
accessibility binder braille data-science data-visualization python
Last synced: 03 Apr 2026
https://github.com/cdcgov/cdh-lava-react
CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.
agile-development azure data-analysis data-catalog data-governance data-quality data-science data-visualization databricks datavisualization devops excel-export metadata operations powerautomate powerbi pyspark security sql test-automation
Last synced: 22 Apr 2025
https://github.com/supercowpowers/scp-labs
SCP Labs (Open Source Team for SuperCowPowers)
data-analysis data-science pandas python scikit-learn security
Last synced: 06 May 2025
https://github.com/jdiaz97/iucnredlist.jl
API Wrapper for the IUCN Red List.
biodiversity data-science ecology
Last synced: 21 Oct 2025
https://github.com/quantifyearth/yirgacheffe
A declarative geospatial library for Python to make data-science with maps easier
data-science geospatial python3
Last synced: 01 Apr 2026
https://github.com/arose13/rosey
Data science utilities for statistics and machine learning
data-science data-visualization keras machine-learning tensorflow
Last synced: 24 Oct 2025
https://github.com/rishisankineni/capital-one-data-challenge
NYC Taxi Data Challenge - Data Scientist
capital-one data-science eda machine-learning python-3-6 xgboost
Last synced: 09 Apr 2025
https://github.com/gianlucatruda/warfit-learn
A machine learning toolkit for reproducible research in anticoagulant dose estimation.
data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn
Last synced: 24 Oct 2025
https://github.com/apear9/riskmapr
Code for riskmapr apps for invasive weed risk mapping
bayesian bayesian-network data-science ecology ecology-of-invasion invasive-species risk-map shiny shiny-apps weeds
Last synced: 30 Jul 2025
https://github.com/amirhosseinhonardoust/workout-efficiency-benchmark
Streamlit + Python pipeline that benchmarks gym workout efficiency (kcal/min) using present sessions only. Generates sortable workout-type benchmarks, distribution plots, fairness-aware gap analysis with uncertainty/low-sample flags, and a data-quality report to prevent misleading comparisons.
analytics benchmarking bias-audit dashboard data-analysis data-quality data-science eda fairness fitness health-data pandas plotly python reporting reproducible-research statistics streamlit visualization workout
Last synced: 10 Jun 2026
https://github.com/zgornel/datalinter
Linting tools for ML workflows, data, code
code-analysis-tool coding-agent data-science linting
Last synced: 21 Apr 2026
https://github.com/aiguofer/sql_connectors
A simple wrapper for SQL connections using SQLAlchemy and Pandas read_sql to standardize SQL workflow with multiple data sources.
data-analysis data-analytics data-exploration data-science pandas relational-databases sql sqlalchemy standardized-api
Last synced: 13 Oct 2025
https://github.com/liamarguedas/uber-eats-delivery-time
Delivery time prediction system for Uber Eats
data-science machine-learning regression
Last synced: 10 Oct 2025
https://github.com/brunocampos01/porto-seguro-safe-driver-prediction
Predict if a driver will file an insurance claim next year. (Kaggle Competition)
challenge data-cleansing data-engineering data-science dataset insurance-claims kaggle kaggle-competition machine-learning porto-seguro python random-forest xgboost
Last synced: 05 Sep 2025
https://github.com/toxpi/toxpir
toxpiR R package for the Toxicological Priority Index (ToxPi) algorithm.
data-science modeling r r-package toxicology
Last synced: 19 Aug 2025
https://github.com/nikhilba/aerial-imagery
Data Science Research Project: Map poverty using satellite images.
carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16
Last synced: 28 Oct 2025
https://github.com/ihmeuw/easylink
A tool that allows users to build and run highly configurable record linkage/entity resolution pipelines.
data-science entity-resolution record-linkage
Last synced: 01 Apr 2026
https://github.com/pitmonticone/reddittextclassification
Reddit Gender Text-Classification.
algorithms artificial-intelligence computer-science data-analysis data-mining data-science data-visualization jupyter-notebook keras-tensorflow language-model machine-learning modeling natural-language-processing neural-network nlp python reddit scikit-learn spacy-nlp tensorflow
Last synced: 24 Oct 2025
https://github.com/lukashedegaard/datasetops
Fluent dataset operations, compatible with your favorite libraries
data-cleaning data-munging data-processing data-science data-wrangling dataset dataset-combinations deep-learning multiple-datasets pytorch tensorflow
Last synced: 23 Apr 2025