Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/thoughtworks/mlops-platforms

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon

Last synced: 02 Aug 2024

https://github.com/mehmetkahya0/ai-catalog

Huge AI models catalog. A curated list of AI tools, platforms, and resources across various domains.

3d ai art artificial-intelligence awesome awesome-list chatbot chatgpt code data-science education image image-processing midjourney openai search search-engine stable-diffusion summarizer text-to

Last synced: 02 Aug 2024

https://github.com/aunum/goro

A High-level Machine Learning Library for Go

data-science go golang machine-learning machinelearning

Last synced: 31 Jul 2024

https://github.com/aaronpenne/data_visualization

A collection of my data visualizations, mostly in Python.

data-science data-visualization python3 visualization

Last synced: 30 Jul 2024

https://github.com/MaxHalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 03 Aug 2024

https://github.com/operatorai/modelstore

🏬 modelstore is a Python library that allows you to version, export, and save a machine learning model to your filesystem or a cloud storage provider.

data-science keras machine-learning mlops modelstore python-library pytorch s3-storage scikit-learn tensorflow transformer

Last synced: 02 Aug 2024

https://github.com/jkrumbiegel/Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 04 Aug 2024

https://github.com/wilsonrljr/sysidentpy

A Python Package For System Identification Using NARMAX Models

data-science dynamical-systems machine-learning narmax narx system-identification time-series

Last synced: 02 Aug 2024

https://github.com/matrix-profile-foundation/matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

algorithms anomaly-detection clustering data-mining data-science hacktoberfest matrixprofile motif-discovery python python2 python3 segmentation time-series time-series-analysis

Last synced: 01 Aug 2024

https://github.com/aeturrell/skimpy

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: 03 Aug 2024

https://github.com/BlackHC/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 03 Aug 2024

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 13 Aug 2024

https://github.com/Niketkumardheeryan/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

analytics data-analysis data-science data-visualization datascience deep-learning deep-neural-networks deployment flask heroku-deployment machine-learning python r statistics streamlit-webapp

Last synced: 02 Aug 2024

https://github.com/meteostat/meteostat-python

Access and analyze historical weather and climate data with Python.

climate climate-change climate-data data-science meteostat open-data statistics weather weather-data weather-station

Last synced: 08 Aug 2024

https://github.com/xoolive/traffic

A toolbox for processing and analysing air traffic data

adsb air-traffic-data data-analytics data-science data-visualisation declarative-pipeline mode-s trajectory

Last synced: 04 Aug 2024

https://github.com/InseeFrLab/onyxia

🔬 Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 03 Sep 2024

https://github.com/souzatharsis/open-quant-live-book

An open source, hands-on and fully reproducible book in quantitative finance, data science and econophysics. Join us and help Make Wall Street Great Again!

algo-trading altdata data-science econophysics financial-analysis financial-markets machine-learning open-source quantitative-finance

Last synced: 02 Aug 2024

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 30 Jul 2024

https://github.com/datmo/datmo

Open source production model management tool for data scientists

artificial-intelligence data-science deep-learning machine-learning reproducibility version-control

Last synced: 02 Aug 2024

https://github.com/olavolav/uniplot

Lightweight plotting to the terminal. 4x resolution via Unicode.

data-analysis data-science plot python

Last synced: 31 Jul 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 09 Aug 2024

https://github.com/larswaechter/voici.js

A Node.js library for pretty printing your data on the terminal🎨

console data-science javascript shell terminal tty typescript

Last synced: 31 Jul 2024

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 01 Aug 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 30 Jul 2024

https://github.com/AnotherSamWilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 05 Aug 2024

https://github.com/profjsb/python-seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

data-science distributed-computing machine-learning python visualization

Last synced: 07 Aug 2024

https://github.com/gdsbook/book

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

data-analysis-python data-science geographic-data geographical-information-system spatial-analysis spatial-data-analysis spatial-statistics statistics

Last synced: 31 Jul 2024

https://github.com/autonlab/auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

causal-inference counterfactual-inference data-science deep-learning graphical-models machine-learning python regression reliability-analysis survival-analysis time-to-event

Last synced: 02 Aug 2024

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 01 Aug 2024

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 01 Aug 2024

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 02 Aug 2024

https://github.com/ShawhinT/YouTube-Blog

Codes to complement YouTube videos and blog posts on Medium.

data-science example-code machine-learning medium-articles youtube

Last synced: 06 Aug 2024

https://github.com/CJWorkbench/cjworkbench

The data journalism platform with built in training

data-analysis data-journalism data-science data-visualization journalism notebook

Last synced: 06 Aug 2024

https://github.com/ml-tooling/ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.

data-science docker jupyter jupyterhub machine-learning python

Last synced: 01 Aug 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 01 Aug 2024

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 31 Jul 2024

https://github.com/maxhumber/redframes

General Purpose Data Manipulation Library

data-science pandas python

Last synced: 01 Aug 2024

https://github.com/mljar/plotai

PlotAI - Your Ultimate Plotting Assistant! 📊🤖 Use ChatGPT-3.5 to create plots in Python and Matplotlib directly in your Python script or notebook.

charts chatgpt data-science llm matplotlib plots python visualization

Last synced: 02 Aug 2024

https://github.com/PPshrimpGo/BDCI2018-ChinauUicom-1st-solution

这是BDCI2018的联通赛题第一名解决方案

competition data-science

Last synced: 08 Aug 2024

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 03 Aug 2024

https://github.com/Dyakonov/PZAD

Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)

data-mining data-science data-visualization education lectures machine-learning ml russian slides

Last synced: 07 Aug 2024

https://github.com/Ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 06 Aug 2024

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed jax machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 01 Aug 2024

https://github.com/glm-tools/pyglmnet

Python implementation of elastic-net regularized generalized linear models

data-science elastic-net glm lasso machine-learning python

Last synced: 02 Aug 2024

https://github.com/Mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 31 Jul 2024

https://github.com/Giorgi/DuckDB.NET

Bindings and ADO.NET Provider for DuckDB

ado-net data-science duckdb duckdb-database hacktoberfest hacktoberfest2023

Last synced: 31 Jul 2024

https://github.com/kamu-data/kamu-cli

New generation decentralized data lake and a streaming data pipeline

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 17 Aug 2024

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 01 Aug 2024

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 02 Aug 2024

https://github.com/senseyeio/roger

Golang RServe client. Use R from Go

data-science go r rserve scientific-computing

Last synced: 02 Aug 2024

https://github.com/rasgointelligence/RasgoQL

Write python locally, execute SQL in your data warehouse

data-analysis data-science pandas python sql

Last synced: 08 Aug 2024

https://github.com/datalayer/jupyter-ui

⚛️ React.js components 💯% compatible with 🪐 Jupyter. https://jupyter-ui-storybook.datalayer.tech

data data-product data-science data-visualisation datalayer ipywidgets jupyter jupyterlab lumino notebook reactjs ui

Last synced: 01 Aug 2024

https://github.com/vopani/datatableton

100 exercises to learn Python Datatable

data-science datatable pydatatable python tutorial-exercises

Last synced: 03 Aug 2024

https://github.com/svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

apache-spark data-processing data-science python

Last synced: 06 Aug 2024

https://github.com/carloocchiena/the_statistics_handbook

the statistics handbook open source repository

data-science latex mathematics statistics

Last synced: 09 Aug 2024

https://github.com/uclatommy/tweetfeels

Real-time sentiment analysis in Python using twitter's streaming api

data-mining data-science python-3-6 sentiment-analysis twitter

Last synced: 06 Aug 2024

https://github.com/dwhitena/gophernet

A simple from-scratch neural net written in Go

artificial-intelligence data-science go golang machine-learning neural-network

Last synced: 02 Aug 2024

https://github.com/ropensci/elastic

R client for the Elasticsearch HTTP API

data-science database database-wrapper elasticsearch etl http json r r-package rstats

Last synced: 03 Aug 2024