Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/aws-samples/aws-ml-jp

SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集

aws data-science deep-learning jupyter-notebook machine-learning mlops sagemaker

Last synced: 08 Nov 2024

https://github.com/arabacibahadir/sup-res

A great companion for finding key support and resistance levels on financial charts, cryptocurrencies.

algotrade analysis binance binance-api bitcoin cryptocurrency data-science finance pandas pinescript python stock telegram telegram-bot tradingview

Last synced: 27 Oct 2024

https://github.com/apache/incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

ai airflow big-data data-science machine-learning ml workflows

Last synced: 01 Oct 2024

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 11 Nov 2024

https://github.com/h2oai/wave-apps

Sample AI Apps built with H2O Wave.

data-science h2oai hacktoberfest low-code machine-learning python3

Last synced: 06 Nov 2024

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 08 Nov 2024

https://github.com/rk2900/drsa

Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.

data-science deep-learning machine-learning survival-analysis

Last synced: 07 Nov 2024

https://rivasiker.github.io/ggHoriPlot/

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 02 Aug 2024

https://github.com/rivasiker/ggHoriPlot

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 02 Aug 2024

https://github.com/picnicml/doddle-model

:cake: doddle-model: machine learning in Scala.

breeze data-science doddle-model machine-learning scala

Last synced: 04 Aug 2024

https://github.com/hamelsmu/seq2seq_tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 27 Oct 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 29 Oct 2024

https://github.com/gzuidhof/zarr.js

Javascript implementation of Zarr

array data-science gehlenborglab javascript typescript zarr

Last synced: 30 Oct 2024

https://github.com/ing-bank/probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.

binary-classifiers data-analysis data-science feature-elimination machine-learning multi-class-classification recursive-feature-elimination regressors shap statistics tree-model

Last synced: 08 Nov 2024

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 30 Oct 2024

https://github.com/morganjwilliams/pyrolite

A set of tools for getting the most from your geochemical data.

chemistry data-science geochemical-data geochemistry geoscience pyrolite ternary-diagrams

Last synced: 25 Oct 2024

https://github.com/njtierney/rmd4sci

Rmarkdown for Scientists

book bookdown data-science r rmarkdown rstats science

Last synced: 27 Oct 2024

https://github.com/machine-learning-apps/ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning

aml azure azure-machine-learning data-science machine-learning machine-learning-lifecycle mlops

Last synced: 02 Nov 2024

https://github.com/RamiKrispin/Introduction-to-Docker

(WIP) Getting started with Docker - An introduction to Docker with data science and engineering applications

data-engineering data-science docker dockerfile

Last synced: 25 Oct 2024

https://github.com/scitime/scitime

Training time estimation for scikit-learn algorithms

data-science machine-learning python scikit-learn timer

Last synced: 01 Nov 2024

https://github.com/suji04/normalizednerd

Codes for the videos of my YouTube channel

data-science machine-learning python tutorial youtube

Last synced: 10 Nov 2024

https://github.com/romanmichaelpaolucci/AI_Stock_Trading

Design pattern for critical stages in the development process of an AI Stock Trading Bot

artificial-intelligence data-science machine-learning neural-network python trading trading-algorithms trading-bot trading-strategies

Last synced: 07 Nov 2024

https://github.com/scrapinghub/python-simhash

An efficient simhash implementation for python

data-science

Last synced: 10 Nov 2024

https://github.com/vkoul/Econ-Data-Science

Articles/ Journals and Videos related to Economics:chart_with_upwards_trend: and Data Science :bar_chart:

casual-inference data-science econometrics economics economist machine-learning social-sciences

Last synced: 02 Aug 2024

https://github.com/winvector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 07 Nov 2024

https://github.com/autoviml/deep_autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

autokeras automl data-science deep-learning gcp keras machine-learning mlflow mljar pycaret python tensorflow tensorflow2 tpot

Last synced: 10 Oct 2024

https://github.com/jadianes/spark-r-notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science exploratory-data-analysis jupyter jupyter-notebook notebook r sparkr

Last synced: 09 Nov 2024

https://github.com/napjon/krisk

Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.

dashboard data-science data-visualization echarts interactive-charts jupyter-notebook python

Last synced: 31 Oct 2024

https://github.com/autoviml/pandas_dq

Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.

data data-science dataquality dataqualitycheck machine-learning pandas python scikit-learn

Last synced: 31 Oct 2024

https://github.com/yandexdataschool/roc_comparison

The fast version of DeLong's method for computing the covariance of unadjusted AUC.

data-science statistics

Last synced: 06 Nov 2024

https://github.com/diffusionkinetics/open

DiffusionKinetics open-source monorepo

data-science haskell

Last synced: 11 Nov 2024

https://github.com/WinVector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 05 Aug 2024

https://github.com/mybridge/learn-machine-learning

Learn to Build a Machine Learning Application from Top Articles

computer-vision data-science deep-learning machine-learning neural-networks

Last synced: 07 Nov 2024

https://github.com/winvector/data_algebra

Codd method-chained SQL generator and Pandas data processing in Python.

data-analysis data-science pandas python

Last synced: 07 Nov 2024

https://github.com/medtagger/MedTagger

A collaborative framework for annotating medical datasets using crowdsourcing.

crowdsourcing data-science data-validation deep-learning labeling medical-imaging

Last synced: 03 Aug 2024

https://github.com/LankyCyril/pyvenn

Python module for plotting Venn diagrams of 2..6 sets

data-science matplotlib matplotlib-venn venn venn-diagram venndiagram visualization

Last synced: 03 Aug 2024

https://github.com/ColtAllen/btyd

Buy Till You Die and Customer Lifetime Value statistical models in Python.

bayesian buy-til-you-die customer-lifetime-value data-science python

Last synced: 02 Aug 2024

https://github.com/ujjwalkarn/xda

R package for exploratory data analysis

data-analysis data-science exploratory-data-analysis r

Last synced: 11 Nov 2024

https://github.com/alexandervnikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets)

augmentations data-augmentation data-science datasets deep-learning generative-model keras machine-learning python synthetic-data synthetic-time-series tensorflow2 time-series vae

Last synced: 13 Oct 2024

https://github.com/solegalli/hyperparameter-optimization

Code repository for the online course Hyperparameter Optimization for Machine Learning

data-science hyperopt hyperparameter-optimization machine-learning optuna python scikit-optimize

Last synced: 30 Oct 2024

https://github.com/JovianHQ/jovian-py

Collaboration platform for data science projects & Jupyter notebooks

data-science deep-learning jupyter-notebook machine-learning ml

Last synced: 11 Oct 2024

https://github.com/jovianhq/jovian-py

Collaboration platform for data science projects & Jupyter notebooks

data-science deep-learning jupyter-notebook machine-learning ml

Last synced: 02 Nov 2024

https://github.com/lawmurray/Birch

A probabilistic programming language that combines automatic differentiation, automatic marginalization, and automatic conditioning within Monte Carlo methods.

autodiff bayesian bayesian-inference bayesian-methods bayesian-statistics data-science machine-learning machine-learning-algorithms machine-learning-projects monte-carlo-methods monte-carlo-sampling probabilistic-programming-languages statistics

Last synced: 30 Oct 2024

https://github.com/lsys/forestplot

A Python package to make publication-ready but customizable coefficient plots.

coefficientplot data-science data-visualization dataviz forestplot matplotlib python visualization

Last synced: 02 Nov 2024

https://github.com/innat/ML-Resource

A concise resource repository for machine learning

data-analysis data-science deep-learning kaggle machine-learning python spark

Last synced: 02 Aug 2024

https://github.com/scrapinghub/mdr

A python library detect and extract listing data from HTML page.

data-science

Last synced: 10 Nov 2024

https://github.com/imsanjoykb/data-science-regular-bootcamp

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

artificial-intelligence data-analysis data-science data-science-notebook data-science-projects data-visualization database-connection deep-learning etl-pipeline etl-process feature-engineering machine-learning mysql-database neural-network numpy pandas postgresql python python-automation sqlite

Last synced: 12 Oct 2024

https://github.com/nicholasmamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 31 Oct 2024

https://github.com/takuti/flurs

:ocean: FluRS: A Python library for streaming recommendation algorithms

data-science factorization-machines machine-learning matrix-factorization python recommender-system

Last synced: 07 Nov 2024

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 07 Aug 2024

https://github.com/clipperhouse/jargon

Tokenizers and lemmatizers for Go

data-science go lemmatizer nlp tokenizer

Last synced: 30 Oct 2024

https://github.com/alexioannides/pymc-example-project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn

Last synced: 27 Oct 2024

https://github.com/olow304/data-science-machine-learning

The overall objective of this toolkit is to provide and offer a free collection of data analysis and machine learning that is specifically suited for doing data science. Its purpose is to get you started in a matter of minutes. You can run this collections either in Jupyter notebook or python alone.

all best-practices cheatsheet cheatsheets data-science data-science-toolkit deep-learning jupyter-notebook machine-learning machine-learning-algorithms machine-learning-tutorials matplotlib mindmap numpy pandas popular-posts python roadmap sklearn toolkit

Last synced: 10 Oct 2024

https://github.com/thomasnield/oreilly_reactive_python_for_data

Resources for the O'Reilly online video "Reactive Python for Data"

data-science database python reactivex rxpy sqlalchemy tweepy twitter

Last synced: 30 Oct 2024

https://github.com/formlio/forml

ForML - A development framework and MLOps platform for the lifecycle management of data science projects

ai data-science machine-learning ml mlops portability python reproducibility

Last synced: 03 Aug 2024

https://github.com/ome/ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

bioimaging cloud data-science file-formats spec

Last synced: 03 Aug 2024

https://github.com/dssg/MLforPublicPolicy

Class resources for CAPP 30254 (Machine Learning for Public Policy)

data-science machine-learning public-policy

Last synced: 27 Oct 2024

https://github.com/senderle/topic-modeling-tool

A point-and-click tool for creating and analyzing topic models produced by MALLET.

data-science digital-humanities mallet text-analytics topic-modeling

Last synced: 02 Aug 2024