Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable implications of data as easily as SQL databases query the data itself.

automatic-data-modeling data-science databases machine-learning probabilistic-programming

Last synced: 01 Aug 2024

https://github.com/dataquestio/project-walkthroughs

Data science, machine learning, and web development project code for https://www.youtube.com/c/Dataquestio .

data-science machine-learning pandas python

Last synced: 08 Aug 2024

https://github.com/tidyverse/datascience-box

Data Science Course in a Box

data-science education r rstats teaching

Last synced: 30 Jul 2024

https://github.com/youssefHosni/Practical-Machine-Learning

Practical machine learning notebook & articles covers the machine learning end to end life cycle.

data-science machine-learning

Last synced: 31 Jul 2024

https://github.com/firmai/data-science-career

Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository

analytics big-data business-analytics business-intelligence career data-science machine-learning resources

Last synced: 07 Aug 2024

https://github.com/webartifex/intro-to-python

An intro to Python & programming for wanna-be data scientists

data-science introduction-to-programming jupyter python tutorial

Last synced: 03 Aug 2024

https://github.com/mlr-org/mlr3

mlr3: Machine Learning in R - next generation

classification data-science machine-learning mlr3 r r-package regression

Last synced: 30 Jul 2024

https://github.com/fraunhoferportugal/tsfel

An intuitive library to extract features from time series.

classification colab-notebook data-science feature-engineering feature-extraction time-series

Last synced: 30 Jul 2024

https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: 01 Aug 2024

https://github.com/stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

dag data-engineering data-platform data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hamilton hamiltonian machine-learning numpy pandas python software-engineering stitch-fix

Last synced: 02 Aug 2024

https://github.com/WenjieDu/PyPOTS

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation, classification, clustering, forecasting, & anomaly detection on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

classification clustering data-mining data-science deep-learning forecasting healthcare imputation incomplete industrial interpolation machine-learning missing-values missingness neural-network partially-observed-time-series pytorch science-research time-series time-series-analysis

Last synced: 01 Aug 2024

https://github.com/turicas/rows

A common, beautiful interface to tabular data, no matter the format

convert-data csv data data-science excel hacktoberfest python table tabular-data xls xlsx

Last synced: 31 Jul 2024

https://github.com/GoogleCloudPlatform/DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

big-data data-analysis data-mining data-processing data-science google-cloud-dataflow

Last synced: 02 Aug 2024

https://github.com/egbertbouman/youtube-comment-downloader

Simple script for downloading Youtube comments without using the Youtube API

data-science data-scraper python youtube youtube-comments

Last synced: 01 Aug 2024

https://github.com/google/lightweight_mmm

LightweightMMM ๐Ÿฆ‡ is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.

bayesian data-science econometrics marketing-science mmm

Last synced: 02 Aug 2024

https://github.com/dswah/pyGAM

[HELP REQUESTED] Generalized Additive Models in Python

data-science gams interpretable-machine-learning machine-learning python scientific-computing

Last synced: 31 Jul 2024

https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3

Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

bayesian-data-analysis data-science pymc python statistics

Last synced: 07 Aug 2024

https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks

A series of Python Jupyter notebooks that help you better understand "The Elements of Statistical Learning" book

data-analysis data-science machine-learning python sklearn statistical-learning tensorflow tutorials

Last synced: 02 Aug 2024

https://github.com/HazyResearch/meerkat

Creative interactive views of any dataset.

data-science foundation-models machine-learning ml pandas

Last synced: 31 Jul 2024

https://github.com/zama-ai/concrete-ml

Concrete ML: Privacy Preserving ML framework built on top of Concrete, with bindings to traditional ML frameworks.

data-science fhe homomorphic-encryption machine-learning ppml privacy python scikit-learn tfhe torch

Last synced: 31 Jul 2024

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 01 Aug 2024

https://github.com/yoshoku/rumale

Rumale is a machine learning library in Ruby

artificial-intelligence data-analysis data-science machine-learning ml ruby rubyml

Last synced: 30 Jul 2024

https://github.com/TheAlgorithms/Jupyter

The repository contains script and notebook related to Statistics, Machine learning, Neural network, Deep learning, NLP, Numerical methods, and Automation.

algorithms data-science data-structures deep-learning hacktoberfest machine-learning neural-network

Last synced: 02 Aug 2024

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 31 Jul 2024

https://github.com/williamFalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter search for neural networks

caffe caffe2 chainer data-science deep-learning grid-search hyperparameter-optimization keras machine-learning neural-networks pytorch random-search tensorflow

Last synced: 31 Jul 2024

https://github.com/target/matrixprofile-ts

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation

Last synced: 31 Jul 2024

https://github.com/compdemocracy/polis

:milky_way: Open Source AI for large scale open ended feedback

civic-tech data-science deliberative-democracy participatory-democracy

Last synced: 01 Aug 2024

https://github.com/glue-viz/glue

Linked Data Visualizations Across Multiple Files

data-science linked-data python visualization

Last synced: 05 Aug 2024

https://github.com/iterative/mlem

๐Ÿถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐Ÿคž

cli data-science deployment developer-tools git machine-learning mlem model-registry python

Last synced: 31 Jul 2024

https://github.com/erikaduan/r_tips

A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning

data-science data-visualization machine-learning r rstats statistics

Last synced: 13 Aug 2024

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 01 Aug 2024

https://github.com/ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization

Last synced: 02 Aug 2024

https://github.com/run-house/runhouse

The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.

api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless

Last synced: 31 Jul 2024

https://github.com/Kotlin/dataframe

Structured data processing in Kotlin

data-analysis data-science dataframe kotlin

Last synced: 01 Aug 2024

https://github.com/litaotao/IPython-Dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 15 Aug 2024

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 31 Jul 2024

https://github.com/nicolaskruchten/jupyter_pivottablejs

Dragโ€™nโ€™drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 01 Aug 2024

https://github.com/pm4py/pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning process-mining python

Last synced: 02 Aug 2024

https://github.com/pm4py/pm4py-source

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning process-mining python

Last synced: 07 Aug 2024

https://github.com/arvkevi/kneed

Knee point detection in Python :chart_with_upwards_trend:

data-analysis data-science elbow-method knee-point python scientific-computing systems

Last synced: 31 Jul 2024

https://github.com/fastai/fastai2

Temporary home for fastai v2 while it's being developed

data-science deep-learning fastai jupyter machine-learning nbdev python pytorch

Last synced: 07 Aug 2024

https://github.com/aeturrell/coding-for-economists

This repository hosts the code behind the online book, Coding for Economists.

book data-science econometrics economics economics-models jupyter-notebook learning python research vscode

Last synced: 01 Aug 2024

https://github.com/rstojnic/lazydata

Lazydata: Scalable data dependencies for Python projects

data-science datamanagement machine-learning python

Last synced: 31 Jul 2024

https://github.com/TrainingByPackt/Data-Science-Projects-with-Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 01 Aug 2024

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 03 Aug 2024

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 30 Jul 2024

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 01 Aug 2024

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 07 Aug 2024

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 01 Aug 2024

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 31 Jul 2024

https://github.com/alegonz/baikal

A graph-based functional API for building complex scikit-learn pipelines.

data-science graph-based machine-learning python scikit-learn

Last synced: 03 Aug 2024

https://github.com/DiskFrame/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 30 Jul 2024

https://github.com/jadianes/data-science-your-way

Ways of doing Data Science Engineering and Machine Learning in R and Python

data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial

Last synced: 31 Jul 2024

https://github.com/gesistsa/rio

๐ŸŸ A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 13 Aug 2024

https://github.com/JacksonWuxs/DaPy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 31 Jul 2024

https://github.com/JuliaStats/GLM.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 02 Aug 2024