Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2026-07-01 00:07:28 UTC
- JSON Representation
https://github.com/mdeff/ntds_2017
Material for the EPFL master course "A Network Tour of Data Science", edition 2017.
data-science education epfl graphs network-science
Last synced: 06 Jul 2025
https://github.com/fastai/book.fast.ai
Information for readers of the fastai book
data-science deep-learning machine-learning python pytorch teaching
Last synced: 24 Dec 2025
https://github.com/visual-layer/visuallayer
Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, mislabels and others.
cleaning computer computer-vision data data-science dataset datasets-preparation generative machine-learning python vision
Last synced: 19 Apr 2025
https://github.com/localcascadeensemble/lce
Random Forest or XGBoost? It is Time to Explore LCE
classification data-science machine-learning python regression scikit-learn-api
Last synced: 13 Apr 2025
https://github.com/gurupatil0003/python_tutorial
Python is a high-level, interpreted programming language known for its simplicity and readability.Python emphasizes code readability and allows programmers to express concepts in fewer lines of code compared to languages like C++ or Java.
data-science database library modules oops-in-python operator python
Last synced: 09 Apr 2025
https://github.com/jianzhnie/AutoTabular
Automatic machine learning for tabular data. โก๐ฅโก
automl catboost data-science deep-learning feature-engineering hpo lightgbm machine-learning pytorch-lightning scikit-learn structured-data tabular-data xgboost
Last synced: 13 Jul 2025
https://github.com/dayyass/qaner
Unofficial implementation of QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition.
data-science machine-learning named-entity-recognition natural-language-processing ner nlp python python3 question-answering
Last synced: 13 Apr 2025
https://github.com/charmve/paperweeklyai
๐ใ@MaiweiAIใStudying papers in the fields of computer vision, NLP, and machine learning algorithms every week.
advanced applied-machine-learning computer-vision data-mining data-science deep-learning machine-learning machine-learning-algorithms nlp paper-with-code papers study-papers tutorials
Last synced: 23 Jun 2025
https://github.com/ermshaua/time-series-segmentation-benchmark
This repository contains the time series segmentation benchmark (TSSB).
change-point change-point-detection data-mining data-science machine-learning python research science segmentation time-series time-series-analysis time-series-data-mining time-series-segmentation unsupervised-learning
Last synced: 05 May 2025
https://github.com/tpvasconcelos/ridgeplot
Beautiful ridgeline plots in Python
data-analysis data-science data-visualization distplot ggridges graphing joyplot plot plotly plotting python ridgeline visualization
Last synced: 04 Apr 2025
https://github.com/DARIAH-DE/Topics
A Python library for topic modeling and visualization
data-science digital-humanities lda machine-learning natural-language-processing python3 text-mining topic-modeling
Last synced: 03 May 2025
https://github.com/randyzwitch/streamlit-embedcode
Streamlit component for embedding code snippets such as GitHub gists, CodePen snippets, Gitlab snippets, etc.
data-analysis data-science data-visualization python streamlit streamlit-component
Last synced: 12 May 2025
https://github.com/dwhitena/oreilly-ai-k8s-tutorial
Materials for the "AI on Kubernetes" tutorial at O'Reilly AI SF 2018
ai data-science deep-learning docker kubernetes machine-learning
Last synced: 13 Sep 2025
https://github.com/devinterview-io/nlp-interview-questions
๐ฃ NLP interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
ai-interview-questions coding-interview-questions coding-interviews data-science data-science-interview data-science-interview-questions data-scientist-interview interview-practice interview-preparation machine-learning machine-learning-and-data-science machine-learning-interview machine-learning-interview-questions nlp nlp-interview-questions nlp-questions nlp-tech-interview software-engineer-interview technical-interview-questions
Last synced: 06 Feb 2026
https://github.com/hemansnation/machine-learning-mlops-generativeai-nlp-cv-mlsystem-design
MLOps - Deploy models at scale, Generative AI - Build applications with LLMs, NLP - Understand Transformers & Text Generation Models, Computer Vision - Build GANs projects like Deepfakes, ML System Design, hands-on project building and code algorithms from scratch.
computer-vision data-science deep-learning generative-ai machine-learning natural-language-processing python
Last synced: 15 Apr 2025
https://github.com/hannansatopay/roughviz
A Python visualization library for creating sketchy/hand-drawn styled charts.
charts data-science hacktoberfest jupyter-notebook python-visualization roughviz vizualisation
Last synced: 14 Apr 2025
https://github.com/loukesio/ggvolc
๐ ๐ ๐ฏ๐จ๐ฅ๐ effortlessly translates differential expression datasets and RNAseq data into informative volcano plots. Highlight genes of interest with unprecedented ease. With just a single line of code, visualize complex datasets, gaining deeper insights and simplifying data representation
bioinformatics data-science data-visualization gro-seq rna-seq
Last synced: 02 Sep 2025
https://github.com/noahgift/data-engineering-and-dataops
Duke MIDS: Data Engineering and DataOps Course
book cloud course data data-science dataengineering dataops duke mlops software-engineering
Last synced: 03 Sep 2025
https://github.com/xplainable/xplainable
Real-time explainable machine learning for business optimisation
auto-ml data-analytics data-science explainable-ai explainable-ml machine-learning machine-learning-algorithms prediction predictions python shap statistics xai xplainable
Last synced: 17 May 2026
https://github.com/dmey/synthia
๐ ๐ Multidimensional synthetic data generation with Copula and fPCA models in Python
augmentation climate copula data-augmentation data-generation data-generator data-modelling data-science dependency-analysis dependency-modeling finance fpca functional-data machine-learning oversampling principal-component-analysis statistics synthetic-data weather xarray
Last synced: 01 Feb 2026
https://github.com/LaihoE/did-it-spill
Check if you have training samples in your test set
computer-vision data-science deep-learning pytorch semantic-similarity time-series
Last synced: 01 May 2025
https://github.com/rickiepark/hg-da
<ํผ์ ๊ณต๋ถํ๋ ๋ฐ์ดํฐ ๋ถ์ with ํ์ด์ฌ>์ ์ฝ๋ ์ ์ฅ์
data-analysis data-science data-visualization machine-learning matplotlib numpy pandas scikit-learn scipy
Last synced: 06 Apr 2025
https://github.com/renumics/sliceguard
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
data-analysis data-cleaning data-curation data-exploration data-science data-visualization deep-learning eda exploratory-data-analysis machine-learning python visualization
Last synced: 16 Mar 2025
https://github.com/microsoft/responsible-ai-toolbox-mitigations
Python library for implementing Responsible AI mitigations.
data-analysis data-science machine-learning python responsible-ai responsible-ml
Last synced: 13 Oct 2025
https://github.com/mooseburger1/springboard-data-science-immersive
convolutional-neural-networks data-science deep-learning deep-neural-networks eda h5 hadoop nlp opencv pyspark python sql statistical-analysis statistical-inference statistical-modeling tensorboard tensorflow time-series-analysis time-series-prediction web-scraping
Last synced: 10 Apr 2025
https://github.com/dayyass/text-classification-baseline
Pipeline for fast building text classification TF-IDF + LogReg baselines.
baseline classification data-science fast hacktoberfest logistic-regression machine-learning natural-language-processing nlp python text text-classification tf-idf
Last synced: 19 Jul 2025
https://github.com/tatevkaren/tatevkaren-data-science-portfolio
Data Science Portfolio of Tatev Karen Aslanyan including Case Studies and Research Projects that I have completed that solve business problems or introduce new products. Case Study papers, codes, and additional resources are all included.
blog case-study computer-science data-analysis data-science deep-learning econometrics machine-learning papers portfolio portfolio-website statistics
Last synced: 10 Apr 2025
https://github.com/polyaxon/hypertune
A library for performing hyperparameter optimization
data-science deep-learning hyperparameter-optimization hyperparameter-tuning machine-learning mlops numpy scikit-learn workflow
Last synced: 24 Dec 2025
https://github.com/provectus/sak-kubeflow
๐ Deploy Kubeflow on AWS EKS with Terraform ๐ค
ai argocd artificial-intelligence automation aws cluster data-science deep-learning devops eks gitops iac infrastructure infrastructure-as-code kubeflow machine-learning ml open-source terraform
Last synced: 18 Apr 2025
https://github.com/opengeos/streamlit-map-template
A streamlit template for mapping applications
data-science geospatial mapping python streamlit
Last synced: 07 Apr 2025
https://github.com/cihat/datastructure
๐๐๐ Veri Yapฤฑlarฤฑ (BMU221) ve bรผtรผn derslerin dokรผmantasyonu. Notes and examples in the data structure and all lessons course. Data Structures with Java.
bilgisayar-muhendisligi computer-science data-science data-structure data-structure-blogs data-structures data-structures-and-algorithms documentation turkce-dokumantasyon veri-bilimi veri-yapilari
Last synced: 23 Jan 2026
https://github.com/omarsar/mri-analysis-pytorch
MRI analysis using PyTorch and MedicalTorch
data-science deep-learning health healthcare medicine neural-network pytorch
Last synced: 22 Mar 2025
https://github.com/codeperfectplus/machine-learning-web-applications
Data science web project implemented in Django framework.
data-science django portfolio python python3
Last synced: 13 May 2025
https://github.com/stevecondylios/priceR
Economics and Pricing in R
cran data-science econometrics economics finance modeling r-programming statistics
Last synced: 30 Jul 2025
https://github.com/dask-contrib/dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array
Last synced: 12 Apr 2025
https://github.com/analyticalnahid/machine-learning-roadmap
Roadmap to becoming Machine Learning Engineer in 2023
analyticalnahid data-science data-visualization machine-learning machine-learning-algorithms python
Last synced: 10 Apr 2025
https://github.com/almost-matching-exactly/dame-flame-python-package
A Python Package providing two algorithms, DAME and FLAME, for fast and interpretable treatment-control matches of categorical data
causal-inference data-science econometrics machine-learning matching python
Last synced: 09 Apr 2026
https://github.com/ucd-dnp/leila
Librerรญa para la evaluaciรณn de calidad de datos, e interacciรณn con el portal de datos.gov.co
data-quality data-science eda espanol exploratory-data-analysis python report-generator ucd
Last synced: 05 Apr 2026
https://github.com/apple/ml-symphony
Symphony: Interactive Data Widgets (CHI 2022)
computational-notebooks data-science data-visualization machine-learning
Last synced: 19 Oct 2025
https://github.com/bcg-x-official/sklearndf
DataFrame support for scikit-learn.
cross-validation data-science feature-traceability hyper-parameter-tuning machine-learning model-selection pandas-dataframe python
Last synced: 09 Apr 2025
https://github.com/bnosac/crfsuite
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package
Last synced: 15 Mar 2026
https://github.com/france-travail/gabarit
Gabarit : kickstart your data science project from scratch
data-science deep-learning machine-learning python
Last synced: 09 Apr 2025
https://github.com/tf-encrypted/moose
Secure distributed dataflow framework for encrypted machine learning and data processing
cryptography data-science distributed-computing machine-learning privacy secure-computation
Last synced: 08 May 2025
https://github.com/terryyz/pyarmadillo
PyArmadillo: an alternative approach to linear algebra in Python
armadillo-library calculations data-science linear-algebra machine-learning
Last synced: 15 Jun 2025
https://github.com/gagolews/genieclust
Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
cluster-analysis clustering clustering-algorithm data-analysis data-mining data-science genie hdbscan hierarchical-clustering hierarchical-clustering-algorithm machine-learning machine-learning-algorithms mlpack nmslib python python3 r sparse
Last synced: 04 Apr 2025
https://github.com/alexioannides/ml-workflow-automation
Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.
classification data-science flask helm jupyter-notebook kaggle kubernetes machine-learning mlops numpy pandas python rest-api sklearn
Last synced: 21 Mar 2025
https://github.com/maxent-ai/zeroshot_topics
Topic Inference with Zeroshot models
bert data-science huggingface hypernymy-extraction keybert keyword-extraction knowledge-graph labelled-data labelling linguistics machine-learning nli nlp taxonomy text text-classification transformers weak-supervision weakly-supervised-learning zeroshot-learning
Last synced: 24 Jun 2025
https://github.com/octoenergy/timeserio
Better `keras` models for time series and beyond
Last synced: 24 Jun 2025
https://github.com/tcsvn/activity-assistant
Activity Assistant provides a platform for logging, evaluating and predicting Activities of Daily Living for Home Assistant.
activities-of-daily-living activity-assistant adls data-mining data-science django django-rest-framework home-assistant home-assistant-addons home-automation homeassistant human-activity-recognition machine-learning smart-home smarthome visualization
Last synced: 06 Apr 2025
https://github.com/andrewtavis/causeinfer
Machine learning based causal inference/uplift in Python
ab-testing causal-inference causality data-analysis data-science data-visualization dataset econometrics machine-learning open-source python python3 statistics treatment-effects uplift uplift-modeling
Last synced: 11 Sep 2025
https://github.com/bytehub-ai/bytehub
ByteHub: making feature stores simple
bytehub-cloud dask data-engineering data-science feature-engineering feature-store featurestore forecasting machine-learning machinelearning machinelearning-python pandas timeseries
Last synced: 27 Aug 2025
https://github.com/rush-db/rushdb
RushDB is an instant database for modern apps and DS/ML ops built on top of Neo4j
app-backend cloud data-analysis data-engineering data-science database docker firebase graph-database graphs instant instant-apps javascript neo4j nestjs rest-api self-hosted typescript web-development
Last synced: 09 May 2026
https://github.com/benedekrozemberczki/spatiotemporal_datasets
Spatiotemporal datasets collected for network science, deep learning and general machine learning research.
analytics benchmark data-science dataset deep-learning deepwalk epidemiology gcn gnn machine-learning node2vec pytorch pytorch-geometric spatial-analysis spatial-data spatial-data-analysis time-series time-series-analysis vector-autoregression
Last synced: 11 Apr 2025
https://github.com/jasonkessler/agefromname
Predict age and gender from a first name
age census-data data-science demographics demography first-names gender names python python-3 python-api social-security-data statistics
Last synced: 29 Apr 2025
https://github.com/stevecondylios/pricer
Economics and Pricing in R
cran data-science econometrics economics finance modeling r-programming statistics
Last synced: 11 Jun 2025
https://github.com/balapriyac/data-science-tutorials
If you're coming from one of my data science tutorials, you'll find the code and the links to the tutorials here. I hope you find them helpful. Happy learning and coding!
data-science python tutorial-sourcecode
Last synced: 02 Jul 2025
https://github.com/datamole-ai/edvart
An open-source Python library for Data Scientists & Data Analysts designed to simplify the exploratory data analysis process. Using Edvart, you can explore data sets and generate reports with minimal coding.
analysis data-analysis data-science data-visualization data-viz eda exploration exploratory-data-analysis exploratory-data-analysis-eda plots python
Last synced: 11 Feb 2026
https://jaeyk.github.io/comp_thinking_social_science/
Computational Thinking for Social Scientists book project
computational-social-science data-science digital-humanities machine-learning python r social-sciences visualization web-scraping
Last synced: 16 Mar 2025
https://github.com/seandavi/sars2pack
An R package with over 50 highly cited, read-to-use, up-to-date COVID-19 pandemic data resources
biomedical-data coronavirus coronavirus-tracking covid-19 data-science data-visualization datascience datasets epidemics epidemiology geospatial public-health rstats rstats-package
Last synced: 25 Feb 2026
https://github.com/symmetryinvestments/excel-d
Excel API bindings and wrapper API for D
ctfe data-science dlang excel metaprogramming native sdk wrapper-api xls xlsw
Last synced: 24 Jan 2026
https://github.com/likejazz/jupyter-notebooks
This repo contains Jupyter Notebooks, miscellaneous stuff.
data-science decision-tree deep-learning jupyter-notebook keras machine-learning nlp pytorch random-forest statistics tensorflow
Last synced: 13 Jun 2025
https://github.com/astrazeneca/judgyprophet
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).
ai bayesian data-science forecasting machine-learning python statistics
Last synced: 08 May 2025
https://github.com/AstraZeneca/judgyprophet
Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).
ai bayesian data-science forecasting machine-learning python statistics
Last synced: 28 Sep 2025
https://github.com/puzzlelib/puzzlelib
Deep Learning framework with NVIDIA & AMD support
data-science deep-learning deep-neural-networks gpu library machine-learning ml neural-network numpy python tensor
Last synced: 15 Jul 2025
https://github.com/tlverse/tlverse-handbook
๐ฏ :closed_book: Targeted Learning in R: A Causal Data Science Handbook
biostatistics causal-data-science causal-inference causal-machine-learning data-science machine-learning statistics targeted-learning tlverse
Last synced: 19 Feb 2026
https://github.com/tirthajyoti/covid-19-analysis
Analysis with Covid-19 data
analytics coronavirus covid-19 covid-data covid19-data data-science epidemiology machine-learning modeling numpy object-oriented-programming pandemic python visualization
Last synced: 14 Jul 2025
https://github.com/mratsim/mckinsey-smartcities-traffic-prediction
Adventure into using multi attention recurrent neural networks for time-series (city traffic) for the 2017-11-18 McKinsey IronMan (24h non-stop) prediction challenge
data-science deep-learning keras machine-learning neural-networks tensorflow time-series
Last synced: 30 Apr 2025
https://github.com/dataprofessor/streamlit-for-datascience
The Streamlit for Data Science shows how to build interactive data apps powered by data visualization and machine learning!!
data-science machine-learning numpy pandas python
Last synced: 19 Jun 2025
https://github.com/benedekrozemberczki/pdn
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
bert cheminformatics data-science deep-learning deepwalk gcn gnn gpt-3 graph-classification graph-neural-network graph2vec message-passing molecules multiplex network-science neural-message-passing node-classification pathfinder pytorch transformer
Last synced: 11 Apr 2025
https://github.com/mganjoo/apple-health-exporter
Python module to export Apple Health dump file to a data frame for analysis
Last synced: 03 Mar 2025
https://github.com/tirthajyoti/julia-data-science
Data science and numerical computing with Julia
artificial-intelligence data-science dataframe deep-learning julia julia-language linear-algebra machine-learning numerical-analysis scientific-computing statistics
Last synced: 06 Mar 2026
https://github.com/rvanasa/pandas-gpt
Power up your data science workflow with ChatGPT.
chatgpt claude-ai data-cleaning data-engineering data-science data-visualization gemini generative-ai gpt4 jupyter-notebook litellm low-code matplotlib numpy o1 openai pandas productivity scipy seaborn
Last synced: 09 May 2025
https://mlverse.github.io/mall/
Run multiple LLM predictions against a data frame with R and Python
data-science dplyr llm polars python r
Last synced: 06 Apr 2025
https://github.com/inist-cnrs/lodex
Linked Open Data EXperiment
data-science data-structures datavisualization mongo nodejs
Last synced: 29 Jan 2026
https://github.com/tommyod/paretoset
Compute the Pareto (non-dominated) set, i.e., skyline operator/query.
data-mining data-science datascience multi-objective-optimization optimization pandas skyline-query
Last synced: 05 Apr 2025
https://github.com/wlandau/targets-minimal
A minimal example data analysis project with the targets R package
data-science high-performance-computing pipeline r reproducibility reproducible-research rstats statistics targets workflow
Last synced: 20 Mar 2025
https://github.com/rosetta-ai/rosetta_recsys2019
The 4th Place Solution to the 2019 ACM Recsys Challenge by Team RosettaAI
artificial-intelligence boosting-tree data-mining data-science deep-learning hotel-recommender lightgbm machine-learning mean-reciprocal-rank neural-network python ranking recommender-system session-based-recommendation-system trivago xgboost
Last synced: 20 Jul 2025
https://github.com/mlverse/mall
Run multiple LLM predictions against a data frame with R and Python
data-science dplyr llm polars python r
Last synced: 24 Oct 2025
https://github.com/noopeeks/datanvim
A fully-featured batteries-included Neovim distribution for the world of Data Science. Prepared to run code and interact with Jupyter Notebooks without ever leaving your terminal.
data data-science distribution jupyter-notebook machine-learning neovim nvim nvim-config text-editor vim
Last synced: 06 Oct 2025
https://github.com/ashishpatel26/datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
catboost data-science data-science-env datascienv imbalanced-data lightgbm matplotlib numpy pandas pycaret scikit-learn seaborn tensorflow2 xgboost
Last synced: 24 Oct 2025
https://github.com/gabrieltseng/datascience-projects
A collection of personal data science projects
Last synced: 18 Jan 2026
https://github.com/ajl2718/whereabouts
Fast, accurate, open-source geocoding in Python
data-science duckdb geocoding geospatial record-linkage
Last synced: 26 Aug 2025
https://ddotta.github.io/cookbook-rpolars/
Cookbook to provide solutions to common tasks and problems in using Polars with R
benchmark cookbook data-engineering data-science datatable dplyr polars r tidyr
Last synced: 13 May 2025
https://github.com/idouble/pandas-python-data-analysis-playground
๐ Data Analysis with the Pandas Library & Notes ๐๐
analysis csv csv-files data data-analysis data-science data-visualization dataframe examples library pandas pandas-dataframe pandas-library pandas-python python
Last synced: 10 Sep 2025
https://github.com/mdeff/ntds_2018
Material for the EPFL master course "A Network Tour of Data Science", edition 2018.
data-science education epfl graphs network-science
Last synced: 12 Jul 2025
https://github.com/jmwoloso/pychattr
Python Channel Attribution (pychattr) - A Python implementation of the excellent R ChannelAttribution library
channel-attribution data-analysis data-science machine-learning python python-channel-attribution rpy2 wrapper
Last synced: 06 May 2025
https://github.com/tgsmith61591/skoot
A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.
data-science imbalanced-data machine-learning pandas python scikit-learn skutil
Last synced: 11 Sep 2025
https://github.com/mikeroyal/apache-flink-guide
Apache Flink Guide
data-science database flink flink-kafka flink-stream-processing flink-streaming stream-processing streaming
Last synced: 02 Sep 2025
https://github.com/splunk/splunk-mltk-container-docker
Splunk App for Data Science and Deep Learning - container images repository
agentic ai artificial-intelligence data-science deep-learning docker llm machine-learning rag splunk splunk-ai
Last synced: 11 Oct 2025
https://github.com/aravind-selvam/forest-fire-prediction
Project for Predicting Algerian Forest Fires and Fire Weather Index Using Machine Learning with Python.
classification-model data-science flask-application jupyter-notebook machine-learning ml prediction-model python regression-models sklearn
Last synced: 11 Apr 2025
https://github.com/lkuffo/data-viz
Mรกs de 50 ejemplos de visualizaciones y anรกlisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx
data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn
Last synced: 30 Jul 2025
https://github.com/universal-automata/liblevenshtein-java
Various utilities regarding Levenshtein transducers. (Java)
approximate-string-matching bioinformatics computational-biology computer-science data-science dictionary distance-metric edit-distance finite-state-automata finite-state-transducer fuzzy-search genomics information-retrieval levenshtein-automata levenshtein-distance machine-learning natural-language-processing search-engine spelling-correction universal-automata
Last synced: 13 Apr 2025
https://github.com/wlandau/drake-examples
Example workflows for the drake R package
data-science drake high-performance-computing makefile pipeline r reproducibility reproducible-research ropensci rstats workflow
Last synced: 20 Mar 2025
https://github.com/tatevkaren/free-resources-books-papers
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
books data-science databricks delta-lake developers econometrics free-books free-resources machine-learning mathematics statistics
Last synced: 17 Feb 2026
https://github.com/cis-team/datascience-squad
Data Science Squad Roadmap
cis-team computer-science data-science dataanalysis
Last synced: 30 Jan 2026
https://github.com/mine-cetinkaya-rundel/teach-r-online
Materials for the Teaching statistics and data science online workshops in July 2020
data-science education rstats statistics
Last synced: 08 Apr 2025
https://github.com/antononcube/mathematicavsr
Example projects, code, and documents for comparing Mathematica with R.
comparison data-analysis data-science machine-learning mathematica r time-series
Last synced: 17 Oct 2025
https://github.com/scrapinghub/aduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).
Last synced: 25 Apr 2025
https://github.com/pmuens/lab
Research Environment to play around with Algorithms and Data (Structures)
algorithms artificial-intelligence artificial-neural-networks data-science deep-learning jupyter jupyter-notebook machine-learning machine-learning-algorithms
Last synced: 24 Apr 2025
https://github.com/faizann24/phishytics-machine-learning-for-phishing
Machine Learning for Phishing Website Detection
artificial-intelligence bpe cybersecurity data-science machine-learning phishing phishing-detection random-forest security security-tools tfidf
Last synced: 14 Jul 2025