Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/maxpumperla/deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"

alphago alphago-zero data-science deep-learning game-of-go games machine-learning neural-networks python

Last synced: 26 Dec 2024

https://github.com/dssg/hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good

data-science dssg machine-learning training tutorial-exercises

Last synced: 12 Nov 2024

https://github.com/dataquestio/project-walkthroughs

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .

data-science machine-learning pandas python

Last synced: 21 Dec 2024

https://github.com/run-house/runhouse

Dispatch and distribute your ML training to "serverless" clusters in Python, like PyTorch for ML infra. Iterable, debuggable, multi-cloud/on-prem, identical across research and production.

api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless

Last synced: 25 Dec 2024

https://github.com/mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 07 Nov 2024

https://github.com/Mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 28 Oct 2024

https://github.com/sematic-ai/sematic

An open-source ML pipeline development platform

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3

Last synced: 15 Nov 2024

https://github.com/zama-ai/concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.

data-science fhe fully-homomorphic-encryption homomorphic-encryption machine-learning ppml privacy python scikit-learn tfhe torch

Last synced: 25 Dec 2024

https://github.com/WenjieDu/PyPOTS

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis

Last synced: 02 Nov 2024

https://github.com/tidyverse/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 26 Dec 2024

https://github.com/grailbio/reflow

A language and runtime for distributed, incremental data processing in the cloud

analysis-pipeline aws bioinformatics-pipeline cloud-computing data-science golang language runtime scientific-computing

Last synced: 26 Oct 2024

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 28 Nov 2024

https://github.com/mlr-org/mlr3

mlr3: Machine Learning in R - next generation

classification data-science machine-learning mlr3 r r-package regression

Last synced: 25 Dec 2024

https://github.com/ramiawar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 20 Dec 2024

https://github.com/egbertbouman/youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API

data-science data-scraper python youtube youtube-comments

Last synced: 25 Dec 2024

https://github.com/iamaziz/pydataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 21 Dec 2024

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 27 Nov 2024

https://github.com/fraunhoferportugal/tsfel

An intuitive library to extract features from time series.

classification colab-notebook data-science feature-engineering feature-extraction time-series

Last synced: 26 Oct 2024

https://github.com/probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

automatic-data-modeling data-science databases machine-learning probabilistic-programming

Last synced: 24 Dec 2024

https://github.com/youssefhosni/practical-machine-learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 23 Dec 2024

https://github.com/RamiAwar/dataline

Chat with your data - AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake, SQLite...

ai chart data-science data-visualization llm sql

Last synced: 30 Nov 2024

https://github.com/firmai/data-science-career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

analytics big-data business-analytics business-intelligence career data-science machine-learning resources

Last synced: 27 Nov 2024

https://github.com/youssefHosni/Practical-Machine-Learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 27 Oct 2024

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 26 Dec 2024

https://github.com/google/lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.

bayesian data-science econometrics marketing-science mmm

Last synced: 13 Nov 2024

https://github.com/webartifex/intro-to-python

An intro to Python & programming for wanna-be data scientists

data-science introduction-to-programming jupyter python tutorial

Last synced: 17 Nov 2024

https://github.com/turicas/rows

A common, beautiful interface to tabular data, no matter the format

convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx

Last synced: 26 Dec 2024

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 26 Sep 2024

https://github.com/empathy87/the-elements-of-statistical-learning-python-notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 25 Dec 2024

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 12 Nov 2024

https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 12 Nov 2024

https://github.com/fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

automation data-pipelines data-science machine-learning mlflow mlops pandera pydantic python

Last synced: 20 Dec 2024

https://github.com/microsoft/RD-Agent

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which let AI drive data-driven AI.

agent ai automation data-mining data-science development llm research

Last synced: 10 Oct 2024

https://github.com/dswah/pyGAM

[HELP REQUESTED] Generalized Additive Models in Python

data-science gams interpretable-machine-learning machine-learning python scientific-computing

Last synced: 30 Oct 2024

https://github.com/Kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 07 Nov 2024

https://github.com/aloctavodia/statistical-rethinking-with-python-and-pymc3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 25 Dec 2024

https://github.com/hazyresearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 22 Dec 2024

https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 27 Nov 2024

https://github.com/HazyResearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 29 Oct 2024

https://github.com/chawlaavi/daily-dose-of-data-science

A collection of code snippets from the publication Daily Dose of Data Science on Substack: http://www.dailydoseofds.com/

data-analysis data-science data-science-tips data-visualization jupyter jupyter-notebook jupyter-tips matplotlib matplotlib-tips numpy pandas pandas-tips python python-tips sklearn

Last synced: 24 Dec 2024

https://github.com/thealgorithms/jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 21 Dec 2024

https://github.com/TheAlgorithms/Jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 13 Nov 2024

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 01 Nov 2024

https://github.com/yoshoku/rumale

Rumale is a machine learning library in Ruby

artificial-intelligence data-analysis data-science machine-learning ml ruby rubyml

Last synced: 23 Dec 2024

https://github.com/compdemocracy/polis

:milky_way: Open Source AI for large scale open ended feedback

civic-tech data-science deliberative-democracy participatory-democracy

Last synced: 01 Nov 2024

https://github.com/scrapinghub/python-crfsuite

A python binding for crfsuite

crf crfsuite data-science

Last synced: 22 Dec 2024

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 28 Oct 2024

https://github.com/glue-viz/glue

Linked Data Visualizations Across Multiple Files

data-science linked-data python visualization

Last synced: 21 Dec 2024

https://github.com/williamfalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 25 Dec 2024