Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
- GitHub: https://github.com/topics/data-science
- Wikipedia: https://en.wikipedia.org/wiki/Data_science
- Related Topics: data-analysis, data-mining, machine-learning, big-data, data-visualization,
- Aliases: datasciences, data-science-project, data-science-algorithm,
- Last updated: 2025-05-18 00:07:06 UTC
- JSON Representation
https://github.com/tuangauss/DataScienceProjects
The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.
data-science data-visualization statistics
Last synced: 29 Mar 2025
https://github.com/datacleaner/DataCleaner
The premier open source Data Quality solution
data data-analysis data-science database datacleaner dataquality desktop etl mdm profiling
Last synced: 27 Mar 2025
https://github.com/JuliaStats/GLM.jl
Generalized linear models in Julia
data-science glm julia regression statistical-models statistics
Last synced: 01 May 2025
https://github.com/xiaodaigh/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 14 Mar 2025
https://github.com/DiskFrame/disk.frame
Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
data data-science large-dataset manipulation-data medium-data r
Last synced: 14 Mar 2025
https://github.com/jacksonwuxs/dapy
Easy-to-use data analysis / manipulation framework for humans
analysis data-analysis data-science efficiency pypi python statistical-reports
Last synced: 05 Apr 2025
https://github.com/alegonz/baikal
A graph-based functional API for building complex scikit-learn pipelines.
data-science graph-based machine-learning python scikit-learn
Last synced: 08 May 2025
https://github.com/inseefrlab/onyxia
๐ฌ Data science environment for k8s
bluehats data-science datalab helm insee kubernetes onyxia
Last synced: 15 May 2025
https://github.com/JacksonWuxs/DaPy
Easy-to-use data analysis / manipulation framework for humans
analysis data-analysis data-science efficiency pypi python statistical-reports
Last synced: 28 Mar 2025
https://github.com/ploomber/jupysql
Better SQL in Jupyter. ๐
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 23 Jan 2025
https://github.com/siznax/wptools
Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis
api-client commons data-science glam linked-open-data mediawiki mediawiki-api open-data python restbase wikidata wikimedia-commons wikipedia wikipedia-api
Last synced: 15 May 2025
https://github.com/kkulma/climate-change-data
:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change
climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats
Last synced: 04 Apr 2025
https://github.com/doubleml/doubleml-for-py
DoubleML - Double Machine Learning in Python
causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn statistics
Last synced: 14 May 2025
https://github.com/ahmedfgad/numpycnn
Building Convolutional Neural Networks From Scratch using NumPy
cnn computer-vision conv-layer convnet convolution convolutional-neural-networks data-science filter numpy pygad python relu relu-layer
Last synced: 12 Apr 2025
https://github.com/mszell/geospatialdatascience
Course materials for: Geospatial Data Science
course-materials data-science geospatial geospatial-analysis geospatial-data geospatial-visualization gis openstreetmap osmnx python street-networks teaching-materials
Last synced: 15 May 2025
https://github.com/plotly/dash-cytoscape
Interactive network visualization in Python and Dash, powered by Cytoscape.js
bioinformatics biopython computational-biology cytoscape cytoscapejs dash data-science graph-theory network-graph network-visualization plotly plotly-dash
Last synced: 03 Apr 2025
https://github.com/youssefhosni/efficient-python-for-data-scientists-book
Official Repo for the Efficient Python for Data Scientists Book. You can buy the book from here:
data-science numpy pandas python
Last synced: 15 May 2025
https://github.com/pgalko/bambooai
A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.
ai ai-agents anthropic data-analysis data-science docker gemini groq llm mistral ollama openai-api pandas pinecone python vector-database vllm
Last synced: 15 May 2025
https://github.com/achuthasubhash/Complete-Life-Cycle-of-a-Data-Science-Project
Complete-Life-Cycle-of-a-Data-Science-Project
analysis data-analysis data-science dataset deep-learning eda exploratory-data-analysis feature-engineering federated-learning machine-learning nlp-models python python-library pytorch reinforcement-learning scraper supervised-learning transfer-learning unsupervised-learning web-scraping
Last synced: 06 May 2025
https://github.com/dmbee/seglearn
Python module for machine learning time series:
data-science machine-learning python time-series
Last synced: 14 Mar 2025
https://dmbee.github.io/seglearn/
Python module for machine learning time series:
data-science machine-learning python time-series
Last synced: 01 Apr 2025
https://github.com/Murgio/Food-Recipe-CNN
food image to recipe with deep convolutional neural networks.
chef classification cnn convolutional-neural-networks cooking-dishes data-science deep-learning dish food food-classification inceptionv3 jupyter-notebook keras machine-learning python3 recipes recognition tsne vgg vgg16
Last synced: 26 Mar 2025
https://github.com/capitalone/datacompy
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
compare dask data data-science dataframes fugue numpy pandas polars pyspark python snowflake snowpark spark
Last synced: 14 May 2025
https://github.com/milaan9/93_python_data_analytics_projects
This repository contains all the data analytics projects that I've worked on in python.
breast-cancer-prediction cervical-cancer-prediction covid-19-prediction data-analytics-projects data-science english-french-tranlation ipython-notebook machine-learning machine-learning-projects poker-hand-predictor python4datascience python4everybody resume-selection stock-news-prediction tutor-milaan9
Last synced: 15 May 2025
https://github.com/rpy2/rpy2
Interface to use R from Python
cffi data-science interoperability python r statistics
Last synced: 14 May 2025
https://github.com/DaoSword/Time-Series-Forecasting-and-Deep-Learning
Resources about time series forecasting and deep learning.
data-science deep-learning forecasting machine-learning series-data series-forecasting time-series time-series-forecasting
Last synced: 26 Mar 2025
https://github.com/GRAAL-Research/poutyne
A simplified framework and utilities for PyTorch
data-science deep-learning keras machine-learning neural-network python pytorch
Last synced: 27 Mar 2025
https://github.com/LearnDataSci/articles
A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
data-analysis data-science data-visualization machine-learning machine-learning-algorithms machinelearning python
Last synced: 13 Apr 2025
https://github.com/underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers
Last synced: 16 May 2025
https://github.com/rushter/heamy
A set of useful tools for competitive data science.
data-science machine-learning stacking
Last synced: 16 May 2025
https://github.com/firmai/pandapy
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
algorithmic-trading arrays data-science data-structures finance machine-learning numpy pandas structured-data
Last synced: 06 May 2025
https://github.com/youssefHosni/Efficient-Python-for-Data-Scientists-Book
Writing clean and optimized Python code
data-science numpy pandas python
Last synced: 16 Mar 2025
https://github.com/firmai/deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
augmentation data-augmentation data-science feature-engineering feature-extraction finance machine-learning tabular-data time-series
Last synced: 06 May 2025
https://github.com/Lackoftactics/facebook_data_analyzer
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
conversation data-science data-visualization english-language facebook facebook-data facebook-data-analyzer ruby ruby-gem scraping script statistics
Last synced: 20 Nov 2024
https://github.com/youssefhosni/efficient-python-for-data-scientists
Writing clean and optimized Python code
data-science numpy pandas python
Last synced: 25 Jan 2025
https://github.com/csinva/csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML ๐, statistics ๐, and AI ๐ค.
ai artificial-intelligence awesome blog computational-neuroscience data-science deep-learning jekyll-themes machine-learning machine-learning-tutorials ml neuroscience notes python pytorch research slides statistics stats website
Last synced: 21 Jan 2025
https://github.com/insitro/redun
Yet another redundant workflow engine
aws bioinformatics data-engineering data-science docker etl gcp ml python workflow-engine
Last synced: 01 Apr 2025
https://github.com/WecoAI/aideml
AIDE: the state-of-the-art machine learning engineer agent, generating machine learning solution code from natural language descriptions.
ai data-science llm machine-learning
Last synced: 02 May 2025
https://github.com/bradleyboehmke/data-science-learning-resources
A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)
Last synced: 07 Apr 2025
https://hdi-project.github.io/ATM/
Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
automl data-science distributed-computing hyperparameter-optimization machine-learning
Last synced: 12 May 2025
https://github.com/justmarkham/pycon-2019-tutorial
Data Science Best Practices with pandas
data-science pandas python tutorial vizualisation
Last synced: 05 Apr 2025
https://github.com/HDI-Project/ATM
Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
automl data-science distributed-computing hyperparameter-optimization machine-learning
Last synced: 25 Nov 2024
https://github.com/giorgi/duckdb.net
Bindings and ADO.NET Provider for DuckDB
ado-net data-science duckdb duckdb-database hacktoberfest
Last synced: 14 May 2025
https://github.com/RunLLM/aqueduct
Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.
ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3
Last synced: 18 Apr 2025
https://github.com/openhackathons-org/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai
Last synced: 27 Mar 2025
https://github.com/bcg-x-official/facet
Human-explainable AI.
data-analytics data-science explainable-ai hyperparameter-tuning interpretability machine-learning model-selection python shap-vector-decomposition simulation statistics
Last synced: 16 May 2025
https://github.com/HoloClean/holoclean
A Machine Learning System for Data Enrichment.
data-enrichment data-science inference-engine machine-learning pytorch
Last synced: 02 May 2025
https://github.com/simonblanke/hyperactive
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
automated-machine-learning bayesian-optimization data-science deep-learning feature-engineering hyperactive hyperparameter-optimization keras machine-learning model-selection neural-architecture-search optimization parallel-computing parameter-tuning python pytorch scikit-learn xgboost
Last synced: 14 May 2025
https://github.com/juliaacademy/datascience
Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar
data-science julia juliaacademy learnjulia
Last synced: 12 Apr 2025
https://github.com/frictionlessdata/datapackage
Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
csv data-science json metadata schema validation
Last synced: 03 Apr 2025
https://github.com/JuliaAcademy/DataScience
Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar
data-science julia juliaacademy learnjulia
Last synced: 15 Mar 2025
https://github.com/SimonBlanke/Hyperactive
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
automated-machine-learning bayesian-optimization data-science deep-learning feature-engineering hyperactive hyperparameter-optimization keras machine-learning model-selection neural-architecture-search optimization parallel-computing parameter-tuning python pytorch scikit-learn xgboost
Last synced: 05 Apr 2025
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 14 May 2025
https://github.com/ericlagergren/decimal
A high-performance, arbitrary-precision, floating-point decimal library.
arbitrary-precision big-decimal data-science decimal dogs-of-instagram financial general-decimal-arithmetic money multi-precision
Last synced: 20 Nov 2024
https://github.com/boxuancui/DataExplorer
Automate Data Exploration and Treatment
cran data-analysis data-exploration data-science eda r r-package rstats visualization
Last synced: 04 Dec 2024
https://github.com/Giorgi/DuckDB.NET
Bindings and ADO.NET Provider for DuckDB
ado-net data-science duckdb duckdb-database hacktoberfest
Last synced: 24 Mar 2025
https://github.com/kevinschaich/pyspark-cheatsheet
๐ Quick reference guide to common patterns & functions in PySpark.
cheat cheatsheet cheatsheets data data-science docs documentation guide guides pyspark pyspark-tutorial quickstart reference references spark spark-sql
Last synced: 10 Apr 2025
https://github.com/microsoft/Reactors
๐ฑ Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.
ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web
Last synced: 05 May 2025
https://github.com/alteryx/compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
ai automl data-labeling data-science labeling labeling-tool machine-learning prediction-engineering prediction-problem training-data
Last synced: 14 May 2025
https://github.com/vi3k6i5/guidedlda
semi supervised guided topic model with custom guidedLDA
data-science guided-topic-modeling guidedlda machine-learning seededlda topic-modeling
Last synced: 12 Apr 2025
https://github.com/vi3k6i5/GuidedLDA
semi supervised guided topic model with custom guidedLDA
data-science guided-topic-modeling guidedlda machine-learning seededlda topic-modeling
Last synced: 03 May 2025
https://github.com/polyaxon/datatile
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 30 Mar 2025
https://github.com/jmschrei/apricot
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
data-science machine-learning python submodular-optimization submodularity
Last synced: 04 Apr 2025
https://github.com/scverse/anndata
Annotated data.
anndata bioinformatics data-science machine-learning scanpy scverse transcriptomics
Last synced: 07 Apr 2025
https://github.com/alteryx/open_source_demos
A collection of demos showcasing automated feature engineering and machine learning in diverse use cases
compose data-science evalml feature-engineering featuretools machine-learning python tutorial
Last synced: 09 Apr 2025
https://github.com/ing-bank/popmon
Monitor the stability of a Pandas or Spark dataframe โ๏ธ
covariate-shift data-analysis data-distributions data-profiling data-science dataset-shifts drift-detection hacktoberfest ing-bank ipython jupyter mlops monitoring pandas population-monitoring python spark statistical-process-control statistical-tests statistics
Last synced: 13 Apr 2025
https://github.com/BCG-X-Official/facet
Human-explainable AI.
data-analytics data-science explainable-ai hyperparameter-tuning interpretability machine-learning model-selection python shap-vector-decomposition simulation statistics
Last synced: 08 May 2025
https://github.com/business-science/modeltime
Modeltime unlocks time series forecast models and machine learning in one framework
arima data-science deep-learning ets forecasting machine-learning machine-learning-algorithms modeltime prophet r-package tbats tidymodeling tidymodels time time-series time-series-analysis timeseries timeseries-forecasting
Last synced: 08 Apr 2025
https://github.com/akanz1/klib
Easy to use Python library of customized functions for cleaning and analyzing data.
data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python
Last synced: 08 May 2025
https://github.com/plotly/dash.jl
Dash for Julia - A Julia interface to the Dash ecosystem for creating analytic web applications in Julia. No JavaScript required.
bioinformatics charting dash dashboard data-science data-visualization finance gui-framework julia modeling no-javascript no-vba plotly plotly-dash productivity react technical-computing web-app
Last synced: 15 May 2025
https://github.com/SwanHubX/SwanLab
โก๏ธSwanLab: your ML experiment notebook. ไฝ ็AIๅฎ้ช็ฌ่ฎฐๆฌ๏ผๆฅๅฟ่ฎฐๅฝไธๅฏ่งๅAI่ฎญ็ปๅ จๆต็จใ
data-science deep-learning fastapi jax machine-learning mlops model-versioning python pytorch tensorboard tensorflow tracking transformers visualization
Last synced: 05 Mar 2025
https://github.com/explosion/prodigy-recipes
๐ณ Recipes for the Prodigy, our fully scriptable annotation tool
active-learning annotation annotation-tool artificial-intelligence computer-vision data-annotation data-science labeling-tool machine-learning machine-teaching natural-language-processing nlp prodigy spacy
Last synced: 04 Apr 2025
https://github.com/hamelsmu/code_search
Code For Medium Article: "How To Create Natural Language Semantic Search for Arbitrary Objects With Deepย Learning"
code-search data-science deep-learning fastai keras machine-learning machine-learning-on-source-code ml-on-code natural-language-processing nlp python pytorch search search-algorithm searching-algorithms semantic-search semantic-search-engine tensorflow tutorial
Last synced: 15 Mar 2025
https://github.com/juliadatascience/juliadatascience
Book on Julia for Data Science
book data data-manipulation data-science data-visualization julia julia-language
Last synced: 14 Apr 2025
https://github.com/ottogroup/palladium
Framework for setting up predictive analytics services
data-science machine-learning scikit-learn
Last synced: 12 Apr 2025
https://github.com/infuseai/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 10 Apr 2025
https://github.com/InfuseAI/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 18 Apr 2025
https://github.com/h2oai/mli-resources
H2O.ai Machine Learning Interpretability Resources
accountability data-mining data-science explainable-ml fairness fatml h2o iml interpretability interpretable-ai interpretable-machine-learning interpretable-ml jupyter-notebooks machine-learning machine-learning-interpretability mli python transparency xai xgboost
Last synced: 05 Apr 2025
https://github.com/s-shemmee/sql-101
Get started with SQL database programming. This beginner's guide provides step-by-step tutorials, practical examples, exercises, and resources to master SQL. Let's unlock the power of data with SQL!
data-analysis data-science sql sql-challenges sql-commands sql-database sql-injection sql-server
Last synced: 05 Apr 2025
https://github.com/serengil/chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python
adaboost c45-trees cart categorical-features data-mining data-science decision-trees gbdt gbm gbrt gradient-boosting gradient-boosting-machine gradient-boosting-machines id3 kaggle machine-learning python random-forest regression-tree
Last synced: 14 May 2025
https://github.com/mfarragher/obsidiantools
Obsidian tools - a Python package for analysing an Obsidian.md vault
data-science knowledge-management network-analysis note-taking obsidian-community obsidian-md python
Last synced: 16 May 2025
https://github.com/breck7/scroll
Scroll is a language for scientists of all ages. Scroll includes a command line app that builds static blogs, websites, CSVs, text files, and more.
blog cms csv data-science knowledge-base knowledge-graph markdown markup markup-language note-taking scroll static-site-generator tree-notation
Last synced: 15 Apr 2025
https://github.com/jasmcaus/ai-math-roadmap
Your no-nonsense guide to the Math used in Artificial Intelligence
ai ai-roadmap artificial-intelligence caer data-science deep-learning machine-learning mathematics neural-network roadmap
Last synced: 26 Feb 2025
https://github.com/a16z/nft-analyst-starter-pack
analytics data-science ethereum nfts python
Last synced: 10 May 2025
https://github.com/jbn/zigzag
Python library for identifying the peaks and valleys of a time series.
data-science statistics technical-analysis
Last synced: 16 May 2025
https://github.com/rjurney/agile_data_code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
agile-data agile-data-science airflow amazon-ec2 amazon-web-services analytics apache-kafka apache-spark data data-science data-syndrome kafka machine-learning machine-learning-algorithms predictive-analytics python python-3 python3 spark vagrant
Last synced: 12 Apr 2025
https://github.com/rjurney/Agile_Data_Code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
agile-data agile-data-science airflow amazon-ec2 amazon-web-services analytics apache-kafka apache-spark data data-science data-syndrome kafka machine-learning machine-learning-algorithms predictive-analytics python python-3 python3 spark vagrant
Last synced: 27 Nov 2024
https://github.com/pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the ๐ฅPyTorch ecosystem. โญ Star to support our work!
computer-vision data-science deep-learning domain-adaptation graph-analysis knowledge-aware-learning machine-learning medical-image-analysis meta-learning multimodal multimodal-learning python pytorch transfer-learning
Last synced: 15 May 2025
https://github.com/DoubleML/doubleml-for-py
DoubleML - Double Machine Learning in Python
causal-inference data-science double-machine-learning econometrics machine-learning python scikit-learn statistics
Last synced: 17 Dec 2024
https://github.com/ploomber/sklearn-evaluation
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
data-science deep-learning jupyter-notebook machine-learning pytorch scikit-learn sklearn tensorflow
Last synced: 13 Apr 2025
https://github.com/rudeboybert/fivethirtyeight
R package of data and code behind the stories and interactives at FiveThirtyEight
cran data-science datajournalism fivethirtyeight r rpackage statistics
Last synced: 16 May 2025
https://github.com/polyaxon/haupt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
bokeh data-processing data-profiling data-science data-visualization deep-learning jupyter lineage machine-learning matplotlib mlops models plotly python pytorch serving tensorflow tracking ui visualization
Last synced: 14 May 2025
https://github.com/FilippoBovo/production-data-science
Production Data Science: a workflow for collaborative data science aimed at production
collaborative data-science production workflow
Last synced: 02 May 2025
https://github.com/aeturrell/skimpy
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
data-science eda exploratory-data-analysis pandas statistics summary-statistics
Last synced: 07 May 2025
https://github.com/filippobovo/production-data-science
Production Data Science: a workflow for collaborative data science aimed at production
collaborative data-science production workflow
Last synced: 05 Apr 2025
https://github.com/vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse
Last synced: 15 May 2025
https://github.com/JuliaDataScience/JuliaDataScience
Book on Julia for Data Science
book data data-manipulation data-science data-visualization julia julia-language
Last synced: 27 Nov 2024
https://github.com/dcai-course/dcai-lab
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 ๐ฉ๐ฝโ๐ป
course data-centric-ai data-science deep-learning homework lab machine-learning
Last synced: 26 Mar 2025
https://github.com/lijqhs/deeplearning-notes
Notes for Deep Learning Specialization Courses led by Andrew Ng.
algorithms andrew-ng backpropagation bias-variance cnn coursera data-analysis data-science deep-learning deeplearning hyperparameter-optimization machine-learning neural-network notes overfitting sequence-models statistics summary tensorflow
Last synced: 10 Apr 2025
https://github.com/pgalko/BambooAI
A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.
ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database
Last synced: 23 Mar 2025