Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2026-06-30 00:07:43 UTC
- JSON Representation
https://github.com/ChawlaAvi/Daily-Dose-of-Data-Science
A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/
data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn
Last synced: 04 Oct 2025
https://github.com/janpfeifer/gonb
GoNB, a Go Notebook Kernel for Jupyter
data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel
Last synced: 14 May 2025
https://github.com/inseefrlab/onyxia
๐ฌ Data science environment for k8s
bluehats data-science datalab helm insee kubernetes onyxia
Last synced: 07 Jun 2026
https://github.com/statespace-tech/statespace
Interactive web apps for AI agents
agent analytics artificial-intelligence context-engineering data-engineering data-science database information-retrieval machine-learning markdown mcp rust sql webapp
Last synced: 04 Mar 2026
https://github.com/mrankitgupta/Data-Analyst-Roadmap
I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge
ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau
Last synced: 07 Sep 2025
https://github.com/ploomber/jupysql
Better SQL in Jupyter. ๐
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 04 Oct 2025
https://github.com/elki-project/elki
ELKI Data Mining Toolkit
anomalydetection cluster-analysis clustering data-analysis data-mining data-mining-algorithms data-science distance-functions index indexing java machine-learning outlier-detection outliers time-series visualization
Last synced: 14 May 2025
https://github.com/alteryx/evalml
EvalML is an AutoML library written in python.
automl data-science feature-engineering feature-selection hyperparameter-tuning machine-learning model-selection optimization
Last synced: 15 May 2025
https://github.com/the-black-knight-01/data-science-competitions
Goal of this repo is to provide the solutions of all Data Science Competitions(Kaggle, Data Hack, Machine Hack, Driven Data etc...).
analytics-vidhya competition-code competitive-data-science-github data-science data-science-competition data-science-competitions datahack-competition kaggle kaggle-competition kaggle-competition-for-beginners kaggle-competition-solutions kaggle-solutions-github kaggle-winning-solutions-github machine-learning machinehack-competition xgboost
Last synced: 25 Nov 2025
https://github.com/pymc-labs/pymc-marketing
Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
btyd buy-till-you-die clv customer-lifetime-value data-science marketing marketing-mix-modeling media-mix-modeling mmm python
Last synced: 14 Dec 2025
https://github.com/mrankitgupta/data-analyst-roadmap
I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge
ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau
Last synced: 13 Apr 2025
https://github.com/enkidevs/curriculum
๐ฉโ๐ซ ๐จโ๐ซ The open-source curriculum of Enki!
ai algorithms blockchain chatgpt computer-science css curriculum data-science education enki git gpt4 html java javascript learn-to-code linux python security sql
Last synced: 15 May 2025
https://github.com/the-black-knight-01/Data-Science-Competitions
Goal of this repo is to provide the solutions of all Data Science Competitions(Kaggle, Data Hack, Machine Hack, Driven Data etc...).
analytics-vidhya competition-code competitive-data-science-github data-science data-science-competition data-science-competitions datahack-competition kaggle kaggle-competition kaggle-competition-for-beginners kaggle-competition-solutions kaggle-solutions-github kaggle-winning-solutions-github machine-learning machinehack-competition xgboost
Last synced: 22 Jul 2025
https://github.com/JosephLai241/URS
Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.
archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud
Last synced: 24 Mar 2025
https://github.com/h1st-ai/h1st
Power Tools for AI Engineers With Deadlines
automl autonomous-vehicles avionics cold-start collaboration cybersecurity data-science datascience-environment energy-optimization ensemble-machine-learning explainability hacktoberfest home-automation human-in-the-loop industrial-iot predictive-maintenance time-series trustworthy-datascience
Last synced: 14 Mar 2025
https://github.com/lgienapp/aquarel
Styling matplotlib made easy
data-science data-visualization matplotlib plotting theme theme-development theming visualization
Last synced: 14 May 2025
https://github.com/jasmcaus/caer
High-performance Vision library in Python. Scale your research, not boilerplate.
ai artificial-intelligence augmentation caer computer-vision cuda data-science deep-learning gpu image-classification image-processing image-segmentation machine-learning neural-network opencv python segmentation type-checking video-processing vision
Last synced: 15 May 2025
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 30 Mar 2025
https://github.com/chrisvoncsefalvay/learn-julia-the-hard-way
Learn Julia the hard way!
data-science hpc julia julia-language julialang language learning learning-by-doing learning-julia scientific-computing statistics technical-computing
Last synced: 16 May 2025
https://github.com/erikaduan/r_tips
R programming tips for data cleaning, data visualisation, statistical modelling and machine learning
data-science data-visualization machine-learning r rstats statistics
Last synced: 29 Jul 2025
https://github.com/astroautomata/SymbolicRegression.jl
Distributed High-Performance Symbolic Regression in Julia
automl data-science distributed-systems equation-discovery evolutionary-algorithms explainable-ai genetic-algorithm interpretable-ml julia machine-learning sciml symbolic symbolic-computation symbolic-regression
Last synced: 29 Mar 2026
https://github.com/scrapinghub/python-crfsuite
A python binding for crfsuite
Last synced: 14 May 2025
https://github.com/rweekly/rweekly.org
R Weekly
blog community data-science data-visualization r rweekly statistics visualization weekly
Last synced: 20 Jul 2025
https://github.com/jalapic/engsoccerdata
English and European soccer results 1871-2022
data-science data-visualization r rstats soccer sport sports sports-stats
Last synced: 06 Feb 2026
https://github.com/biomedsciai/causallib
A Python package for modular causal inference analysis and model evaluations
causal causal-inference causal-models causality data-science machine-learning ml
Last synced: 14 May 2025
https://github.com/jwkvam/bowtie
:bowtie: Create a dashboard with python!
ant-design antd dashboard data-science flask interactive jupyter plotly python react socket-io visualization webapp
Last synced: 16 May 2025
https://github.com/moderndive/ModernDive_book
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
bootstrap-method confidence-intervals data-science data-visualization data-wrangling dplyr ggplot2 hypothesis-testing infer moderndive permutation-test r regression regression-models rstats rstudio statistical-inference tidy tidyverse
Last synced: 15 Mar 2025
https://github.com/abhayspawar/featexp
Feature exploration for supervised learning
data-exploration data-science feature-engineering machine-learning visualization
Last synced: 14 Jan 2026
https://github.com/AgnostiqHQ/covalent
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
covalent data-pipeline data-science deep-learning hacktoberfest hpc hpc-applications machine-learning machinelearning machinelearning-python orchestration parallelization pipelines python quantum quantum-computing quantum-machine-learning workflow workflow-automation workflow-management
Last synced: 30 Mar 2025
https://github.com/vivekpa/introneuralnetworks
Introducing neural networks to predict stock prices
algorithmic-trading data-science finance guide keras-tensorflow lstm-neural-networks machine-learning mlp-networks neural-network prediction prediction-mod python quantitative-finance regression-models stock-price-prediction stock-prices trading trading-strategies tutorial yahoo-finance
Last synced: 04 Apr 2025
https://github.com/glue-viz/glue
Linked Data Visualizations Across Multiple Files
data-science linked-data python visualization
Last synced: 14 May 2025
https://github.com/kennethleungty/failed-ml
Compilation of high-profile real-world examples of failed machine learning projects
ai artificial-intelligence classification computer-vision data-engineering data-quality data-science deep-learning failed-data-science failed-machine-learning failed-ml fml forecasting machine-learning ml natural-language-processing production recsys regression
Last synced: 18 Feb 2026
https://github.com/target/matrixprofile-ts
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile
data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation
Last synced: 14 Jan 2026
https://github.com/ipython-books/cookbook-2nd-code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization
Last synced: 12 Apr 2025
https://github.com/nipy/nipype
Workflows and interfaces for neuroimaging packages
big-data brain-imaging brainweb data-science dataflow dataflow-programming neuroimaging python workflow-engine
Last synced: 21 Oct 2025
https://github.com/williamFalcon/test-tube
Python library to easily log experiments and parallelize hyperparameter search for neural networks
caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow
Last synced: 27 Mar 2025
https://github.com/williamfalcon/test-tube
Python library to easily log experiments and parallelize hyperparameter search for neural networks
caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow
Last synced: 23 Feb 2026
https://github.com/akfamily/aktools
AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!
akshare asyncio data data-science fastapi openapi pydanti
Last synced: 14 May 2025
https://github.com/explosion/spacy-stanza
๐ฅ Use the latest Stanza (StanfordNLP) research models directly in spaCy
corenlp data-science machine-learning natural-language-processing nlp spacy spacy-pipeline stanford-corenlp stanford-machine-learning stanford-nlp stanza
Last synced: 15 May 2025
https://github.com/kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
ai artificial-intelligence classification computer-vision data-engineering data-quality data-science deep-learning failed-data-science failed-machine-learning failed-ml fml forecasting machine-learning ml natural-language-processing production recsys regression
Last synced: 04 Apr 2025
https://github.com/pdpipe/pdpipe
Easy pipelines for pandas DataFrames.
data data-science dataframe dataframes pandas pandas-dataframe pipeline
Last synced: 06 Mar 2026
https://github.com/arvkevi/kneed
Knee point detection in Python :chart_with_upwards_trend:
data-analysis data-science elbow-method knee-point python scientific-computing systems
Last synced: 21 Oct 2025
https://github.com/iterative/mlem
๐ถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐ค
cli data-science deployment developer-tools git machine-learning mlem model-registry python
Last synced: 26 Mar 2025
https://github.com/ShawhinT/YouTube-Blog
Codes to complement YouTube videos and blog posts on Medium.
data-science example-code machine-learning medium-articles youtube
Last synced: 18 Jul 2025
https://github.com/ashishpatel26/amazing-feature-engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn
Last synced: 16 May 2025
https://github.com/huntermcgushion/hyperparameter_hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
ai artificial-intelligence catboost data-science deep-learning experimentation feature-engineering hyperparameter-optimization hyperparameter-tuning keras lightgbm machine-learning ml neural-network optimization python rgf scikit-learn sklearn xgboost
Last synced: 16 May 2025
https://github.com/HunterMcGushion/hyperparameter_hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
ai artificial-intelligence catboost data-science deep-learning experimentation feature-engineering hyperparameter-optimization hyperparameter-tuning keras lightgbm machine-learning ml neural-network optimization python rgf scikit-learn sklearn xgboost
Last synced: 26 Apr 2025
https://github.com/scikit-mobility/scikit-mobility
scikit-mobility: mobility analysis in Python
complex-systems data-analysis data-science human-mobility mobility-analysis mobility-flows network-science risk-assessment scikit-mobility statistics synthetic-flows
Last synced: 21 Oct 2025
https://github.com/nicolaskruchten/jupyter_pivottablejs
Dragโnโdrop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js
data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables
Last synced: 15 May 2025
https://github.com/google/edward2
A simple probabilistic programming language.
bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow
Last synced: 05 Sep 2025
https://github.com/ashishpatel26/Amazing-Feature-Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn
Last synced: 10 Apr 2025
https://github.com/fastverse/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 11 Jan 2026
https://github.com/TrainingByPackt/Data-Science-Projects-with-Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
data-science machine-learning numpy pandas pandas-dataframe python scikit-learn
Last synced: 14 Apr 2025
https://github.com/trainingbypackt/data-science-projects-with-python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
data-science machine-learning numpy pandas pandas-dataframe python scikit-learn
Last synced: 04 Apr 2025
https://github.com/MilesCranmer/SymbolicRegression.jl
Distributed High-Performance Symbolic Regression in Julia
automl data-science distributed-systems equation-discovery evolutionary-algorithms explainable-ai genetic-algorithm interpretable-ml julia machine-learning sciml symbolic symbolic-computation symbolic-regression
Last synced: 04 May 2025
https://github.com/litaotao/ipython-dashboard
A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.
dashboard data-science ipython ipython-dashboard notebook visualization
Last synced: 16 May 2025
https://github.com/litaotao/IPython-Dashboard
A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.
dashboard data-science ipython ipython-dashboard notebook visualization
Last synced: 03 Aug 2025
https://github.com/dataproofer/Dataproofer
A proofreader for your data
cli command-line csv data-analysis data-mining data-science excel nodejs spreadsheet
Last synced: 30 Mar 2025
https://github.com/milescranmer/symbolicregression.jl
Distributed High-Performance Symbolic Regression in Julia
automl data-science distributed-systems equation-discovery evolutionary-algorithms explainable-ai genetic-algorithm interpretable-ml julia machine-learning sciml symbolic symbolic-computation symbolic-regression
Last synced: 14 May 2025
https://github.com/BiomedSciAI/causallib
A Python package for modular causal inference analysis and model evaluations
causal causal-inference causal-models causality data-science machine-learning ml
Last synced: 27 Mar 2025
https://github.com/buckaroo-data/buckaroo
Buckaroo - The data table UI for Notebooks. Quickly explore dataframes, scroll through dataframes, search, sort, view summary stats and histograms. Works with Pandas, Polars, Jupyter, Marimo, VSCode Notebooks
buckaroo data-science jupyter marimo-notebook paddy pandas polars
Last synced: 22 May 2026
https://github.com/odpi/opends4all
OpenDS4All project, hosted by LF AI & Data
data-science jupyter-notebooks materials
Last synced: 10 Jun 2025
https://github.com/sebkrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 14 May 2025
https://github.com/floydwch/kaggle-cli
(Deprecated, use https://github.com/Kaggle/kaggle-api instead) An unofficial Kaggle command line tool.
Last synced: 30 Dec 2025
https://github.com/jphall663/interpretable_machine_learning_with_python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
accountability data-mining data-science decision-tree fairness fatml gradient-boosting-machine h2o iml interpretability interpretable interpretable-ai interpretable-machine-learning interpretable-ml lime machine-learning machine-learning-interpretability python transparency xai
Last synced: 16 May 2025
https://github.com/kotlin/kandy
Kotlin plotting library.
data-science graphics jupyter-notebooks kotlin plot
Last synced: 04 Jul 2025
https://github.com/perpetual-ml/perpetual
Perpetual is a high-performance gradient boosting machine. It delivers optimal accuracy in a single run without complex tuning through a simple budget parameter. It features out-of-the-box support for causal ML, continual learning, native calibration, and robust drift monitoring, along with Rust core and zero-copy bindings for Python and R
data-science gbdt gbm gradient-boosted-trees gradient-boosting gradient-boosting-decision-trees kaggle machine-learning python rust
Last synced: 02 Apr 2026
https://github.com/github/codespaces-jupyter
Explore machine learning and data science with Codespaces
codespaces data-science jupyter-notebook machine-learning
Last synced: 11 Apr 2025
https://github.com/stefmolin/Hands-On-Data-Analysis-with-Pandas-2nd-edition
Materials for following along with Hands-On Data Analysis with Pandas โ Second Edition
data-analysis data-analysis-pandas data-analysis-python data-manipulation data-science data-wrangling machine-learning pandas visualizing-data
Last synced: 07 Sep 2025
https://github.com/fabsig/GPBoost
Combining tree-boosting with Gaussian process and mixed effects models
artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r
Last synced: 04 Feb 2026
https://github.com/yzhao062/combo
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination
aggregation data-mining data-science ensemble-learning machine-learning machine-learning-pipelines model-combination pipeline-framework python
Last synced: 08 Apr 2025
https://github.com/fastai/fastai2
Temporary home for fastai v2 while it's being developed
data-science deep-learning fastai jupyter machine-learning nbdev python pytorch
Last synced: 19 Jul 2025
https://github.com/faktionai/awesome-ai-usecases
A list of awesome and proven Artificial Intelligence use cases and applications
Last synced: 14 Mar 2025
https://github.com/dataprofessor/streamlit_freecodecamp
Build 12 Data Apps in Python with Streamlit
data-science exploratory-data-analysis machine-learning python streamlit
Last synced: 16 May 2025
https://github.com/aeturrell/coding-for-economists
This repository hosts the code behind the online book, Coding for Economists.
book data-science econometrics economics economics-models jupyter-notebook learning python research vscode
Last synced: 12 Oct 2025
https://github.com/uxlfoundation/onedal
oneAPI Data Analytics Library (oneDAL)
ai-inference ai-machine-learning ai-training analytics big-data cpp data-analysis data-science hacktoberfest machine-learning machine-learning-algorithms oneapi onedal swrepo
Last synced: 11 Dec 2025
https://github.com/scverse/anndata
Annotated data.
anndata bioinformatics data-science machine-learning scanpy scverse transcriptomics
Last synced: 29 Jan 2026
https://github.com/Kotlin/kandy
Kotlin plotting library.
data-science graphics jupyter-notebooks kotlin plot
Last synced: 12 Apr 2025
https://github.com/blue-yonder/turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.
data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup
Last synced: 14 May 2025
https://github.com/sforaidl/kd_lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization
Last synced: 16 May 2025
https://github.com/cerndb/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow
Last synced: 03 Oct 2025
https://github.com/rstojnic/lazydata
Lazydata: Scalable data dependencies for Python projects
data-science datamanagement machine-learning python
Last synced: 26 Mar 2025
https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024
A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.
aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo
Last synced: 04 Apr 2025
https://github.com/squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 16 May 2025
https://github.com/erezsh/preql
An interpreted relational query language that compiles to SQL.
data-science database python query sql
Last synced: 16 May 2025
https://github.com/Squarespace/datasheets
Read data from, write data to, and modify the formatting of Google Sheets
data data-analytics data-science dataframe google pandas python
Last synced: 15 Mar 2025
https://github.com/uxlfoundation/oneDAL
oneAPI Data Analytics Library (oneDAL)
ai-inference ai-machine-learning ai-training analytics big-data cpp data-analysis data-science hacktoberfest machine-learning machine-learning-algorithms oneapi onedal swrepo
Last synced: 13 Mar 2025
https://github.com/benedekrozemberczki/datasets
A repository of pretty cool datasets that I collected for network science and machine learning research.
benchmark community-detection data-science dataset deepwalk dimensionality-reduction gcn gnn graph-convolution graph-embedding graph-neural-network graph2vec link-prediction machine-learning network-analysis network-embedding network-science node-classification node-embedding node2vec
Last synced: 13 Feb 2026
https://github.com/chris-greening/instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping
Last synced: 07 Apr 2025
https://github.com/juliastats/glm.jl
Generalized linear models in Julia
data-science glm julia regression statistical-models statistics
Last synced: 14 May 2025
https://github.com/erezsh/Preql
An interpreted relational query language that compiles to SQL.
data-science database python query sql
Last synced: 26 Mar 2025
https://github.com/PIA-Group/BioSPPy
Biosignal Processing in Python
biosignals data-science physiological-computing python signal-processing
Last synced: 12 Sep 2025
https://github.com/fabsig/gpboost
Combining tree-boosting with Gaussian process and mixed effects models
artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r
Last synced: 14 May 2025
https://github.com/SebKrantz/collapse
Advanced and Fast Data Transformation in R
cran data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data r rstats scientific-computing statistics time-series weighted weights
Last synced: 26 Apr 2025
https://github.com/run-ai/genv
GPU environment and cluster management with LLM support
bash container-runtime containers data-science deep-learning docker gpu gpus jupyter-notebook jupyterlab-extension k8s kubernetes llm-inference llms nvidia-gpu ollama ray vscode vscode-extension zsh
Last synced: 16 May 2025
https://github.com/jadianes/data-science-your-way
Ways of doing Data Science Engineering and Machine Learning in R and Python
data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial
Last synced: 04 Apr 2025
https://github.com/tuangauss/DataScienceProjects
The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.
data-science data-visualization statistics
Last synced: 29 Mar 2025
https://github.com/diskframe/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 16 Jun 2025