Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/RunLLM/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 09 Nov 2024

https://github.com/HoloClean/holoclean

A Machine Learning System for Data Enrichment.

data-enrichment data-science inference-engine machine-learning pytorch

Last synced: 12 Nov 2024

https://github.com/ericlagergren/decimal

A high-performance, arbitrary-precision, floating-point decimal library.

arbitrary-precision big-decimal data-science decimal dogs-of-instagram financial general-decimal-arithmetic money multi-precision

Last synced: 20 Nov 2024

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 07 Nov 2024

https://github.com/farukalamai/advanced-machine-learning-engineer-roadmap-2024

A Full Stack ML (Machine Learning) Roadmap involves learning the necessary skills and technologies to become proficient in all aspects of machine learning, including data collection and preprocessing, model development, deployment, and maintenance.

aws computer-vision data-analysis data-science data-visualization deep-learning git-github machine-learning machine-learning-roadmap mlops natural-language-processing neural-network nlp opencv pandas python pytorch statistics tensorflow yolo

Last synced: 22 Dec 2024

https://github.com/microsoft/Reactors

🌱 Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.

ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web

Last synced: 13 Nov 2024

https://github.com/juliaacademy/datascience

Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar

data-science julia juliaacademy learnjulia

Last synced: 21 Dec 2024

https://github.com/openhackathons-org/gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai

Last synced: 30 Oct 2024

https://github.com/jmschrei/apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

data-science machine-learning python submodular-optimization submodularity

Last synced: 21 Dec 2024

https://github.com/vi3k6i5/GuidedLDA

semi supervised guided topic model with custom guidedLDA

data-science guided-topic-modeling guidedlda machine-learning seededlda topic-modeling

Last synced: 13 Nov 2024

https://github.com/fabsig/gpboost

Combining tree-boosting with Gaussian process and mixed effects models

artificial-intelligence boosting cpp data-science gaussian-processes machine-learning mixed-effects python r

Last synced: 17 Dec 2024

https://github.com/akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python

Last synced: 15 Nov 2024

https://github.com/JuliaAcademy/DataScience

Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar

data-science julia juliaacademy learnjulia

Last synced: 27 Oct 2024

https://github.com/swanhubx/swanlab

⚡️SwanLab: your ML experiment notebook. 你的AI实验笔记本,日志记录与可视化AI训练全流程。

data-science deep-learning fastapi jax machine-learning mlops model-versioning python pytorch tensorboard tensorflow tracking transformers visualization

Last synced: 19 Dec 2024

https://github.com/plotly/dash.jl

Dash for Julia - A Julia interface to the Dash ecosystem for creating analytic web applications in Julia. No JavaScript required.

bioinformatics charting dash dashboard data-science data-visualization finance gui-framework julia modeling no-javascript no-vba plotly plotly-dash productivity react technical-computing web-app

Last synced: 20 Dec 2024

https://github.com/ottogroup/palladium

Framework for setting up predictive analytics services

data-science machine-learning scikit-learn

Last synced: 22 Dec 2024

https://github.com/capitalone/datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python spark

Last synced: 18 Dec 2024

https://github.com/frictionlessdata/specs

Technical specifications and guidelines for implementing Frictionless Data.

csv data-science json metadata schema validation

Last synced: 06 Nov 2024

https://github.com/pgalko/bambooai

A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.

ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database

Last synced: 21 Dec 2024

https://github.com/ploomber/sklearn-evaluation

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

data-science deep-learning jupyter-notebook machine-learning pytorch scikit-learn sklearn tensorflow

Last synced: 19 Dec 2024

https://github.com/giorgi/duckdb.net

Bindings and ADO.NET Provider for DuckDB

ado-net data-science duckdb duckdb-database hacktoberfest

Last synced: 19 Dec 2024

https://github.com/serengil/chefboost

A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python

adaboost c45-trees cart categorical-features data-mining data-science decision-trees gbdt gbm gbrt gradient-boosting gradient-boosting-machine gradient-boosting-machines id3 kaggle machine-learning python random-forest regression-tree

Last synced: 12 Nov 2024

https://github.com/rudeboybert/fivethirtyeight

R package of data and code behind the stories and interactives at FiveThirtyEight

cran data-science datajournalism fivethirtyeight r rpackage statistics

Last synced: 22 Dec 2024

https://github.com/filippobovo/production-data-science

Production Data Science: a workflow for collaborative data science aimed at production

collaborative data-science production workflow

Last synced: 23 Dec 2024

https://github.com/FilippoBovo/production-data-science

Production Data Science: a workflow for collaborative data science aimed at production

collaborative data-science production workflow

Last synced: 12 Nov 2024

https://github.com/jbn/zigzag

Python library for identifying the peaks and valleys of a time series.

data-science statistics technical-analysis

Last synced: 22 Dec 2024

https://github.com/pykale/pykale

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

computer-vision data-science deep-learning domain-adaptation graph-analysis knowledge-aware-learning machine-learning medical-image-analysis meta-learning multimodal multimodal-learning python pytorch transfer-learning

Last synced: 20 Dec 2024

https://github.com/pgalko/BambooAI

A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.

ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database

Last synced: 28 Oct 2024

https://github.com/tlkh/ai-lab

All-in-one AI container for rapid prototyping

cuda data-science deep-learning docker jupyter nvidia pytorch tensorflow

Last synced: 22 Dec 2024

https://github.com/girder/girder

A data management platform for the web, developed by Kitware

data-analytics data-management data-science javascript kitware python resonant

Last synced: 04 Nov 2024

https://github.com/blackhc/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 21 Dec 2024

https://github.com/BlackHC/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 15 Nov 2024

https://github.com/kevintpeng/Learn-Something-Every-Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

algorithm aws blog computer-science course-materials data-engineering data-science education educational engineering learning math mathematics research software-engineering university unix waterloo

Last synced: 28 Oct 2024

https://github.com/jobream/List-of-Learning-Resources

This collection provides a list of educational resources for Software Engineers. Feel free to add your favorite resources as well and help others in their journey of learning.

competitive-programming computer-science data-science resources software-engineering web-development

Last synced: 12 Nov 2024

https://github.com/DataScienceUB/introduction-datascience-python-book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

analytics data data-science datascience machine-learning python sentiment-analysis

Last synced: 26 Nov 2024

https://github.com/mfarragher/obsidiantools

Obsidian tools - a Python package for analysing an Obsidian.md vault

data-science knowledge-management network-analysis note-taking obsidian-community obsidian-md python

Last synced: 20 Dec 2024

https://github.com/plotly/dash-table

OBSOLETE: now part of https://github.com/plotly/dash

dash data-science data-visualization plotly plotly-dash python react table

Last synced: 05 Nov 2024

https://github.com/firmai/pandasvault

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

data-science data-structures dataframe functions pandas python snippets table tips

Last synced: 15 Nov 2024

https://github.com/publicdomaincompany/scroll

Scroll is a language for scientists of all ages. Scroll includes a command line app that builds static blogs, websites, CSVs, text files, and more.

blog cms csv data-science knowledge-base knowledge-graph markdown markup markup-language note-taking scroll static-site-generator tree-notation

Last synced: 23 Nov 2024

https://github.com/breck7/scroll

Scroll is a language for scientists of all ages. Scroll includes a command line app that builds static blogs, websites, CSVs, text files, and more.

blog cms csv data-science knowledge-base knowledge-graph markdown markup markup-language note-taking scroll static-site-generator tree-notation

Last synced: 08 Nov 2024

https://github.com/ashishpatel26/resourcebank_cv_nlp_mlops_2022

This repository offers a goldmine of materials for students of computer vision, natural language processing, and machine learning operations.

computer-vision data-science deep-learning mlops natural-language-processing

Last synced: 23 Dec 2024

https://github.com/okfn-brasil/rosie

🤖 Python application responsible for Serenata de Amor's intelligence

artificial-intelligence data-science machine-learning

Last synced: 31 Oct 2024

https://github.com/dcai-course/dcai-lab

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

course data-centric-ai data-science deep-learning homework lab machine-learning

Last synced: 30 Oct 2024

https://github.com/Niketkumardheeryan/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

analytics data-analysis data-science data-visualization datascience deep-learning deep-neural-networks deployment flask heroku-deployment machine-learning python r statistics streamlit-webapp

Last synced: 13 Nov 2024

https://github.com/rebecca-vickery/data-science-learning-resources

A comprehensive list of free resources for learning data science

artificial-intelligence data data-science machine-learning python

Last synced: 11 Nov 2024

https://github.com/ClimbsRocks/machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

auto-ml automated-machine-learning automl data-science data-scientists javascript javascript-library kaggle machine-learning machine-learning-algorithms machine-learning-library ml numerai scikit-learn

Last synced: 27 Nov 2024

https://github.com/climbsrocks/machinejs

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

auto-ml automated-machine-learning automl data-science data-scientists javascript javascript-library kaggle machine-learning machine-learning-algorithms machine-learning-library ml numerai scikit-learn

Last synced: 22 Dec 2024

https://github.com/Chicago/food-inspections-evaluation

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.

cdph chicago data-science food-poisoning open-data open-science public-health

Last synced: 30 Oct 2024

https://github.com/ShawhinT/YouTube-Blog

Codes to complement YouTube videos and blog posts on Medium.

data-science example-code machine-learning medium-articles youtube

Last synced: 25 Nov 2024

https://github.com/sforaidl/genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

algorithm-implementations benchmarking data-science deep-learning gym hacktoberfest machine-learning neural-network openai python pytorch reinforcement-learning reinforcement-learning-algorithms

Last synced: 18 Dec 2024

https://github.com/SforAiDl/genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

algorithm-implementations benchmarking data-science deep-learning gym hacktoberfest machine-learning neural-network openai python pytorch reinforcement-learning reinforcement-learning-algorithms

Last synced: 12 Nov 2024

https://github.com/5agado/data-science-learning

Repository of code and resources related to different data science and machine learning topics. For learning, practice and teaching purposes.

data-science deep-learning jupyter-notebook learning-by-doing machine-learning statistics

Last synced: 08 Nov 2024

https://github.com/platonai/PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql

Last synced: 05 Nov 2024

https://github.com/ledell/useR-machine-learning-tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

data-science deep-learning ensemble-learning gradient-boosting-machine machine-learning r random-forest tutorial

Last synced: 27 Nov 2024

https://github.com/ledell/user-machine-learning-tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

data-science deep-learning ensemble-learning gradient-boosting-machine machine-learning r random-forest tutorial

Last synced: 23 Dec 2024

https://github.com/rio-labs/rio

WebApps in pure Python. No JavaScript, HTML and CSS needed

data-analysis data-science data-visualization deep-learning machine-learning python ui webapp

Last synced: 06 Nov 2024

https://github.com/tobgu/qframe

Immutable data frame for Go

data-frame data-science dataframe go golang immutable

Last synced: 21 Dec 2024

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 13 Nov 2024

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 11 Oct 2024

https://github.com/youssefhosni/awesome-data-science-resoruces

A curated list of data science educational resources for essential data science skills

computer-science data-science deep-learning machine-learning statistics

Last synced: 07 Nov 2024

https://github.com/airalcorn2/Michael-s-Data-Science-Curriculum

This is the companion curriculum to my guide to becoming a data scientist.

curriculum data-science machine-learning statistics

Last synced: 22 Nov 2024

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 19 Dec 2024

https://github.com/epistasislab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 22 Dec 2024

https://github.com/EpistasisLab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 30 Oct 2024