Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/run-house/runhouse

The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.

api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless

Last synced: 11 Oct 2024

https://github.com/pymc-labs/pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.

btyd buy-till-you-die clv customer-lifetime-value data-science marketing media-mix-modeling mmm python

Last synced: 30 Oct 2024

https://github.com/mrankitgupta/data-analyst-roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 12 Oct 2024

https://github.com/litaotao/ipython-dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 30 Oct 2024

https://github.com/litaotao/IPython-Dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 15 Aug 2024

https://github.com/nicolaskruchten/jupyter_pivottablejs

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 10 Oct 2024

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 30 Oct 2024

https://github.com/odpi/opends4all

OpenDS4All project, hosted by LF AI & Data

data-science jupyter-notebooks materials

Last synced: 09 Nov 2024

https://github.com/pm4py/pm4py-source

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning process-mining python

Last synced: 07 Aug 2024

https://github.com/pm4py/pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning process-mining python

Last synced: 10 Nov 2024

https://github.com/faktionai/awesome-ai-usecases

A list of awesome and proven Artificial Intelligence use cases and applications

data-science machine-learning

Last synced: 13 Oct 2024

https://github.com/fastai/fastai2

Temporary home for fastai v2 while it's being developed

data-science deep-learning fastai jupyter machine-learning nbdev python pytorch

Last synced: 07 Aug 2024

https://github.com/janpfeifer/gonb

GoNB, a Go Notebook Kernel for Jupyter

data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel

Last synced: 22 Oct 2024

https://github.com/TrainingByPackt/Data-Science-Projects-with-Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 08 Nov 2024

https://github.com/krish-adi/barfi

Python Flow Based Programming environment that provides a graphical programming environment.

ai-ml data-science dataflow-programming flow-based-programming framework graphical-programming jupyter jupyter-notebook ml python streamlit

Last synced: 10 Oct 2024

https://github.com/aeturrell/coding-for-economists

This repository hosts the code behind the online book, Coding for Economists.

book data-science econometrics economics economics-models jupyter-notebook learning python research vscode

Last synced: 07 Nov 2024

https://github.com/rstojnic/lazydata

Lazydata: Scalable data dependencies for Python projects

data-science datamanagement machine-learning python

Last synced: 29 Oct 2024

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 28 Sep 2024

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 26 Oct 2024

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 15 Oct 2024

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 06 Nov 2024

https://github.com/erezsh/preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 01 Nov 2024

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 07 Aug 2024

https://github.com/tuangauss/DataScienceProjects

The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.

data-science data-visualization statistics

Last synced: 01 Nov 2024

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 29 Oct 2024

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 29 Oct 2024

https://github.com/Kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 07 Nov 2024

https://github.com/jadianes/data-science-your-way

Ways of doing Data Science Engineering and Machine Learning in R and Python

data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial

Last synced: 09 Nov 2024

https://github.com/DiskFrame/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 25 Oct 2024

https://github.com/juliastats/glm.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 12 Oct 2024

https://github.com/alegonz/baikal

A graph-based functional API for building complex scikit-learn pipelines.

data-science graph-based machine-learning python scikit-learn

Last synced: 03 Aug 2024

https://github.com/gesistsa/rio

🐟 A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 14 Oct 2024

https://github.com/JacksonWuxs/DaPy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 31 Oct 2024

https://github.com/github/codespaces-jupyter

Explore machine learning and data science with Codespaces

codespaces data-science jupyter-notebook machine-learning

Last synced: 07 Oct 2024

https://github.com/fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

automation data-pipelines data-science machine-learning mlflow mlops pandera pydantic python

Last synced: 06 Nov 2024

https://github.com/kkulma/climate-change-data

:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change

climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats

Last synced: 01 Nov 2024

https://github.com/JuliaStats/GLM.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 02 Aug 2024

https://github.com/dmbee/seglearn

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 26 Oct 2024

https://dmbee.github.io/seglearn/

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 02 Nov 2024

https://github.com/rpy2/rpy2

Interface to use R from Python

cffi data-science interoperability python r statistics

Last synced: 10 Nov 2024

https://github.com/siznax/wptools

Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis

api-client commons data-science glam linked-open-data mediawiki mediawiki-api open-data python restbase wikidata wikimedia-commons wikipedia wikipedia-api

Last synced: 14 Oct 2024

https://github.com/GRAAL-Research/poutyne

A simplified framework and utilities for PyTorch

data-science deep-learning keras machine-learning neural-network python pytorch

Last synced: 30 Oct 2024

https://github.com/LearnDataSci/articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

data-analysis data-science data-visualization machine-learning machine-learning-algorithms machinelearning python

Last synced: 07 Nov 2024

https://github.com/starpig1129/ai-data-analysis-mulitagent

AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 17 Sep 2024

https://github.com/rushter/heamy

A set of useful tools for competitive data science.

data-science machine-learning stacking

Last synced: 03 Aug 2024

https://github.com/firmai/pandapy

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

algorithmic-trading arrays data-science data-structures finance machine-learning numpy pandas structured-data

Last synced: 04 Nov 2024

https://github.com/Lackoftactics/facebook_data_analyzer

Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more

conversation data-science data-visualization english-language facebook facebook-data facebook-data-analyzer ruby ruby-gem scraping script statistics

Last synced: 04 Aug 2024

https://github.com/youssefhosni/efficient-python-for-data-scientists

Writing clean and optimized Python code

data-science numpy pandas python

Last synced: 07 Nov 2024

https://github.com/justmarkham/pycon-2019-tutorial

Data Science Best Practices with pandas

data-science pandas python tutorial vizualisation

Last synced: 30 Oct 2024

https://github.com/bradleyboehmke/data-science-learning-resources

A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)

data-science machine-learning

Last synced: 14 Oct 2024

https://github.com/HDI-Project/ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 06 Aug 2024

https://hdi-project.github.io/ATM/

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 03 Aug 2024

https://github.com/youssefHosni/Efficient-Python-for-Data-Scientists

Writing clean and optimized Python code

data-science numpy pandas python

Last synced: 27 Oct 2024

https://github.com/aqueducthq/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 17 Aug 2024

https://github.com/RunLLM/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 09 Nov 2024

https://github.com/HoloClean/holoclean

A Machine Learning System for Data Enrichment.

data-enrichment data-science inference-engine machine-learning pytorch

Last synced: 02 Aug 2024

https://github.com/ashishpatel26/Amazing-Feature-Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 07 Nov 2024

https://github.com/ericlagergren/decimal

A high-performance, arbitrary-precision, floating-point decimal library.

arbitrary-precision big-decimal data-science decimal dogs-of-instagram financial general-decimal-arithmetic money multi-precision

Last synced: 04 Aug 2024

https://github.com/microsoft/Reactors

🌱 Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.

ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web

Last synced: 02 Aug 2024

https://github.com/openhackathons-org/gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai

Last synced: 30 Oct 2024

https://github.com/jmschrei/apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

data-science machine-learning python submodular-optimization submodularity

Last synced: 30 Oct 2024

https://github.com/vi3k6i5/GuidedLDA

semi supervised guided topic model with custom guidedLDA

data-science guided-topic-modeling guidedlda machine-learning seededlda topic-modeling

Last synced: 02 Aug 2024