Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/aunum/goro

A High-level Machine Learning Library for Go

data-science go golang machine-learning machinelearning

Last synced: 28 Oct 2024

https://github.com/jkrumbiegel/chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 13 Nov 2024

https://github.com/adicherlavenkatasai/ml-workspace

Machine Learning (Beginners Hub), information(courses, books, cheat sheets, live sessions) related to machine learning, data science and python is available

cheat-sheets convolutional-networks data-science deep-learning deep-neural-networks gans harvard-edx interview-questions machine-learning python

Last synced: 31 Oct 2024

https://github.com/solegalli/feature-engineering-for-machine-learning

Code repository for the online course Feature Engineering for Machine Learning

data-science feature-engineering feature-extraction machine-learning python

Last synced: 13 Nov 2024

https://github.com/xoolive/traffic

A toolbox for processing and analysing air traffic data

adsb air-traffic-data data-analytics data-science data-visualisation declarative-pipeline mode-s trajectory

Last synced: 13 Nov 2024

https://github.com/aaronpenne/data_visualization

A collection of my data visualizations, mostly in Python.

data-science data-visualization python3 visualization

Last synced: 25 Oct 2024

https://github.com/maxhalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 14 Nov 2024

https://github.com/matrix-profile-foundation/matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

algorithms anomaly-detection clustering data-mining data-science hacktoberfest matrixprofile motif-discovery python python2 python3 segmentation time-series time-series-analysis

Last synced: 12 Oct 2024

https://github.com/MaxHalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 03 Aug 2024

https://github.com/jkrumbiegel/Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 04 Aug 2024

https://github.com/aeturrell/skimpy

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: 14 Nov 2024

https://github.com/olavolav/uniplot

Lightweight plotting to the terminal. 4x resolution via Unicode.

data-analysis data-science plot python

Last synced: 31 Oct 2024

https://github.com/BlackHC/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 03 Aug 2024

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 29 Oct 2024

https://github.com/joaquinamatrodrigo/estadistica-con-r

Apuntes personales sobre estadística, machine learning y lenguaje de programación R

bioestadistica data-mining data-science estadistica machine-learning mineria-de-datos r

Last synced: 14 Nov 2024

https://github.com/anothersamwilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 10 Nov 2024

https://github.com/meteostat/meteostat-python

Access and analyze historical weather and climate data with Python.

climate climate-change climate-data data-science meteostat open-data statistics weather weather-data weather-station

Last synced: 08 Aug 2024

https://github.com/souzatharsis/open-quant-live-book

An open source, hands-on and fully reproducible book in quantitative finance, data science and econophysics. Join us and help Make Wall Street Great Again!

algo-trading altdata data-science econophysics financial-analysis financial-markets machine-learning open-source quantitative-finance

Last synced: 11 Nov 2024

https://github.com/InseeFrLab/onyxia

🔬 Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 03 Sep 2024

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 25 Oct 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Oct 2024

https://github.com/datmo/datmo

Open source production model management tool for data scientists

artificial-intelligence data-science deep-learning machine-learning reproducibility version-control

Last synced: 12 Nov 2024

https://github.com/larswaechter/voici.js

A Node.js library for pretty printing your data on the terminal🎨

console data-science javascript shell terminal tty typescript

Last synced: 31 Oct 2024

https://github.com/yzhao062/data-mining-conferences

Ranking, acceptance rate, deadline, and publication tips

data-mining data-science research

Last synced: 26 Oct 2024

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 01 Nov 2024

https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 12 Nov 2024

https://github.com/machine-learning-apps/issue-label-bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 29 Sep 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 25 Oct 2024

https://github.com/jovianhq/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

data-science datasets machine-learning python

Last synced: 09 Nov 2024

https://github.com/AnotherSamWilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 05 Aug 2024

https://github.com/profjsb/python-seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

data-science distributed-computing machine-learning python visualization

Last synced: 07 Aug 2024

https://github.com/tommyod/efficient-apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 11 Nov 2024

https://github.com/gdsbook/book

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

data-analysis-python data-science geographic-data geographical-information-system spatial-analysis spatial-data-analysis spatial-statistics statistics

Last synced: 27 Oct 2024

https://github.com/autonlab/auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

causal-inference counterfactual-inference data-science deep-learning graphical-models machine-learning python regression reliability-analysis survival-analysis time-to-event

Last synced: 12 Nov 2024

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 13 Oct 2024

https://github.com/microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

data-generation data-science machine-learning ner ocr-recognition python synthetic-data synthetic-data-generation synthetic-images text-alignment

Last synced: 13 Nov 2024

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 11 Nov 2024

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 04 Nov 2024

https://github.com/ShawhinT/YouTube-Blog

Codes to complement YouTube videos and blog posts on Medium.

data-science example-code machine-learning medium-articles youtube

Last synced: 06 Aug 2024

https://github.com/mljar/plotai

PlotAI - Your Ultimate Plotting Assistant! 📊🤖 Use ChatGPT-3.5 to create plots in Python and Matplotlib directly in your Python script or notebook.

charts chatgpt data-science llm matplotlib plots python visualization

Last synced: 09 Nov 2024

https://github.com/CJWorkbench/cjworkbench

The data journalism platform with built in training

data-analysis data-journalism data-science data-visualization journalism notebook

Last synced: 06 Aug 2024

https://github.com/ml-tooling/ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.

data-science docker jupyter jupyterhub machine-learning python

Last synced: 10 Nov 2024

https://github.com/maxhumber/redframes

General Purpose Data Manipulation Library

data-science pandas python

Last synced: 07 Nov 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 05 Nov 2024

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 30 Oct 2024

https://github.com/PPshrimpGo/BDCI2018-ChinauUicom-1st-solution

这是BDCI2018的联通赛题第一名解决方案

competition data-science

Last synced: 08 Aug 2024

https://github.com/solegalli/feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning

data-science feature-selection machine-learning python

Last synced: 12 Nov 2024

https://github.com/data-describe/data-describe

data⎰describe: Pythonic EDA Accelerator for Data Science

analysis data-science eda exploratory-data-analysis pypi

Last synced: 07 Nov 2024

https://github.com/drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

data-ethics data-science ethics machine-learning

Last synced: 14 Nov 2024

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 13 Nov 2024

https://github.com/ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 13 Oct 2024

https://github.com/Dyakonov/PZAD

Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)

data-mining data-science data-visualization education lectures machine-learning ml russian slides

Last synced: 07 Aug 2024

https://github.com/Ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 06 Aug 2024

https://github.com/microsoft/NimbusML

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 09 Nov 2024

https://github.com/microsoft/nimbusml

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 30 Sep 2024

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 12 Oct 2024

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 02 Nov 2024

https://github.com/glm-tools/pyglmnet

Python implementation of elastic-net regularized generalized linear models

data-science elastic-net glm lasso machine-learning python

Last synced: 12 Nov 2024

https://github.com/Giorgi/DuckDB.NET

Bindings and ADO.NET Provider for DuckDB

ado-net data-science duckdb duckdb-database hacktoberfest hacktoberfest2023

Last synced: 28 Oct 2024

https://github.com/Mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 28 Oct 2024

https://github.com/giorgi/duckdb.net

Bindings and ADO.NET Provider for DuckDB

ado-net data-science duckdb duckdb-database hacktoberfest hacktoberfest2023

Last synced: 29 Oct 2024

https://github.com/kamu-data/kamu-cli

New generation decentralized data lake and a streaming data pipeline

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 14 Oct 2024