Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/epistasislab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 22 Dec 2024

https://github.com/EpistasisLab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 30 Oct 2024

https://github.com/basedosdados/sdk

⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 22 Dec 2024

https://github.com/basedosdados/mais

⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 13 Oct 2024

https://github.com/dagshub/fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

data-science dvc git

Last synced: 20 Dec 2024

https://github.com/plotly/dashR

Create data science and AI web apps in R

dash data-science data-visualization plotly plotly-dash python r react web-application

Last synced: 27 Oct 2024

https://github.com/DagsHub/fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

data-science dvc git

Last synced: 15 Nov 2024

https://github.com/finos/jupyterlab_templates

Support for jupyter notebook templates in jupyterlab

data-science dataviz jupyter jupyterlab jupyterlab-extension machine-learning notebook

Last synced: 07 Nov 2024

https://github.com/wilsonrljr/sysidentpy

A Python Package For System Identification Using NARMAX Models

data-science dynamical-systems machine-learning narmax narx system-identification time-series

Last synced: 12 Nov 2024

https://github.com/operatorai/modelstore

🏬 modelstore is a Python library that allows you to version, export, and save a machine learning model to your filesystem or a cloud storage provider.

data-science keras machine-learning mlops modelstore python-library pytorch s3-storage scikit-learn tensorflow transformer

Last synced: 22 Dec 2024

https://github.com/solegalli/feature-engineering-for-machine-learning

Code repository for the online course Feature Engineering for Machine Learning

data-science feature-engineering feature-extraction machine-learning python

Last synced: 20 Dec 2024

https://github.com/thoughtworks/mlops-platforms

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon

Last synced: 12 Nov 2024

https://github.com/jkrumbiegel/chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 21 Dec 2024

https://github.com/ptyadana/data-science-and-machine-learning-projects-dojo

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

boosting-algorithms data-analysis data-science data-visualization deep-learning keras machine-learning machine-learning-algorithms natural-language-processing pandas probability-statistics scikit-learn seaborn tensorflow

Last synced: 23 Dec 2024

https://github.com/aunum/goro

A High-level Machine Learning Library for Go

data-science go golang machine-learning machinelearning

Last synced: 28 Oct 2024

https://github.com/jkrumbiegel/Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 19 Nov 2024

https://github.com/adicherlavenkatasai/ml-workspace

Machine Learning (Beginners Hub), information(courses, books, cheat sheets, live sessions) related to machine learning, data science and python is available

cheat-sheets convolutional-networks data-science deep-learning deep-neural-networks gans harvard-edx interview-questions machine-learning python

Last synced: 31 Oct 2024

https://github.com/aaronpenne/data_visualization

A collection of my data visualizations, mostly in Python.

data-science data-visualization python3 visualization

Last synced: 25 Oct 2024

https://github.com/xoolive/traffic

A toolbox for processing and analysing air traffic data

adsb air-traffic-data data-analytics data-science data-visualisation declarative-pipeline mode-s trajectory

Last synced: 27 Dec 2024

https://github.com/weijie-chen/econometrics-with-python

Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

data-analysis data-science econometrics economics python statistics time-series

Last synced: 22 Dec 2024

https://github.com/MaxHalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 15 Nov 2024

https://github.com/maxhalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 24 Dec 2024

https://github.com/matrix-profile-foundation/matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

algorithms anomaly-detection clustering data-mining data-science hacktoberfest matrixprofile motif-discovery python python2 python3 segmentation time-series time-series-analysis

Last synced: 22 Dec 2024

https://github.com/anothersamwilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 23 Dec 2024

https://github.com/aeturrell/skimpy

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: 14 Nov 2024

https://github.com/olavolav/uniplot

Lightweight plotting to the terminal. 4x resolution via Unicode.

data-analysis data-science plot python

Last synced: 31 Oct 2024

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 22 Dec 2024

https://github.com/joaquinamatrodrigo/estadistica-con-r

Apuntes personales sobre estadística, machine learning y lenguaje de programación R

bioestadistica data-mining data-science estadistica machine-learning mineria-de-datos r

Last synced: 22 Dec 2024

https://github.com/AnotherSamWilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 22 Nov 2024

https://github.com/meteostat/meteostat-python

Access and analyze historical weather and climate data with Python.

climate climate-change climate-data data-science meteostat open-data statistics weather weather-data weather-station

Last synced: 27 Nov 2024

https://github.com/souzatharsis/open-quant-live-book

An open source, hands-on and fully reproducible book in quantitative finance, data science and econophysics. Join us and help Make Wall Street Great Again!

algo-trading altdata data-science econophysics financial-analysis financial-markets machine-learning open-source quantitative-finance

Last synced: 11 Nov 2024

https://github.com/finlay-liu/kaggle_public

阿水的数据竞赛开源分支

data-science kaggle-competition

Last synced: 24 Dec 2024

https://github.com/InseeFrLab/onyxia

🔬 Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 27 Dec 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 27 Dec 2024

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 25 Oct 2024

https://github.com/datmo/datmo

Open source production model management tool for data scientists

artificial-intelligence data-science deep-learning machine-learning reproducibility version-control

Last synced: 12 Nov 2024

https://github.com/datalayer/jupyter-ui

⚛️ React.js components 💯% compatible with 🪐 Jupyter - Storybook on https://jupyter-ui-storybook.datalayer.tech

data data-product data-science data-visualisation datalayer ipywidgets jupyter jupyterlab lumino notebook reactjs ui

Last synced: 20 Dec 2024

https://github.com/larswaechter/voici.js

A Node.js library for pretty printing your data on the terminal🎨

console data-science javascript shell terminal tty typescript

Last synced: 31 Oct 2024

https://github.com/yzhao062/data-mining-conferences

Ranking, acceptance rate, deadline, and publication tips

data-mining data-science research

Last synced: 27 Dec 2024

https://github.com/jovianhq/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

data-science datasets machine-learning python

Last synced: 21 Dec 2024

https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 25 Dec 2024

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 01 Nov 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 25 Oct 2024

https://github.com/machine-learning-apps/issue-label-bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 29 Sep 2024

https://github.com/profjsb/python-seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

data-science distributed-computing machine-learning python visualization

Last synced: 27 Nov 2024

https://github.com/maxhumber/redframes

General Purpose Data Manipulation Library

data-science pandas python

Last synced: 26 Dec 2024

https://github.com/tommyod/efficient-apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 26 Dec 2024

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 20 Dec 2024

https://github.com/gdsbook/book

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

data-analysis-python data-science geographic-data geographical-information-system spatial-analysis spatial-data-analysis spatial-statistics statistics

Last synced: 27 Oct 2024

https://github.com/autonlab/auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

causal-inference counterfactual-inference data-science deep-learning graphical-models machine-learning python regression reliability-analysis survival-analysis time-to-event

Last synced: 12 Nov 2024

https://github.com/microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

data-generation data-science machine-learning ner ocr-recognition python synthetic-data synthetic-data-generation synthetic-images text-alignment

Last synced: 21 Dec 2024

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 11 Nov 2024

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 04 Nov 2024

https://github.com/solegalli/feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning

data-science feature-selection machine-learning python

Last synced: 21 Dec 2024

https://github.com/ml-tooling/ml-hub

🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.

data-science docker jupyter jupyterhub machine-learning python

Last synced: 20 Dec 2024

https://github.com/kamu-data/kamu-cli

Next-generation decentralized data lakehouse and a multi-party stream processing network

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 20 Dec 2024

https://github.com/mljar/plotai

PlotAI - Your Ultimate Plotting Assistant! 📊🤖 Use ChatGPT-3.5 to create plots in Python and Matplotlib directly in your Python script or notebook.

charts chatgpt data-science llm matplotlib plots python visualization

Last synced: 09 Nov 2024

https://github.com/CJWorkbench/cjworkbench

The data journalism platform with built in training

data-analysis data-journalism data-science data-visualization journalism notebook

Last synced: 24 Nov 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 05 Nov 2024

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 30 Oct 2024