Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/hamelsmu/docker_tutorial

Code and helper scripts for article on Medium "How Docker Can Help You Become A More Effective Data Scientist"

data-science docker docker-tutorial medium medium-article

Last synced: 27 Oct 2024

https://github.com/phillipdupuis/dtale-desktop

Build a data visualization dashboard with simple snippets of python code

data-analysis data-science data-visualization fastapi pandas python react typescript visualization

Last synced: 17 Nov 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 08 Aug 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 27 Oct 2024

https://github.com/risenw/datasist

A Python library for easy data analysis, visualization, exploration and modeling

data-analysis data-science data-visualization feature-engineering machine-learning python-3

Last synced: 14 Nov 2024

https://github.com/pyscaffold/pyscaffoldext-dsproject

💫 PyScaffold extension for data-science projects

data-science pyscaffold pyscaffold-extension python

Last synced: 15 Nov 2024

https://github.com/heidelbergcement/hcrystalball

A library that unifies the API for most commonly used libraries and modeling techniques for time-series forecasting in the Python ecosystem.

cross-validation data-science fbprophet model-selection pmdarima sarimax sklearn sklearn-api sklearn-compatible sklearn-library sktime statsmodels tbats time-series time-series-forecasting transformer wrapper

Last synced: 10 Oct 2024

https://github.com/thebabylonai/babylog

A lightweight logger for machine learning teams to log images and predictions in production.

computer-vision cvops data-science logger logging-library machine-learning ml mlops python python3

Last synced: 14 Nov 2024

https://github.com/oxinabox/datadeps.jl

reproducible data setup for reproducible science

data data-science open-science

Last synced: 18 Oct 2024

https://github.com/minusxai/minusx

MinusX is an AI Data Scientist for Analytics Apps you already use and love. Currently it supports Jupyter, Metabase, & Posthog.

artificial-intelligence data-analytics data-science jupyter metabase

Last synced: 11 Oct 2024

https://github.com/whitews/flowkit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces

cytometry data-science fcs fcs-files flow-cytometry flow-cytometry-analysis flowjo gatingml immunology python

Last synced: 11 Nov 2024

https://github.com/emilhvitfeldt/r-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 30 Oct 2024

https://github.com/voila-dashboards/voici

Voici turns any Jupyter Notebook into a static web application

dashboards data-science emscripten jupyter jupyterlite voila-dashboard wasm

Last synced: 04 Sep 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 05 Aug 2024

https://github.com/mybridge/learn-python

Python Top 45 Articles of 2017

algorithm data-science machine-learning python python3

Last synced: 07 Nov 2024

https://rivasiker.github.io/ggHoriPlot/

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 13 Nov 2024

https://github.com/rivasiker/ggHoriPlot

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 12 Nov 2024

https://github.com/aws-samples/aws-ml-jp

SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集

aws data-science deep-learning jupyter-notebook machine-learning mlops sagemaker

Last synced: 08 Nov 2024

https://github.com/arabacibahadir/sup-res

A great companion for finding key support and resistance levels on financial charts, cryptocurrencies.

algotrade analysis binance binance-api bitcoin cryptocurrency data-science finance pandas pinescript python stock telegram telegram-bot tradingview

Last synced: 27 Oct 2024

https://github.com/apache/incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

ai airflow big-data data-science machine-learning ml workflows

Last synced: 01 Oct 2024

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 11 Nov 2024

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 16 Nov 2024

https://github.com/h2oai/wave-apps

Sample AI Apps built with H2O Wave.

data-science h2oai hacktoberfest low-code machine-learning python3

Last synced: 06 Nov 2024

https://github.com/rk2900/drsa

Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.

data-science deep-learning machine-learning survival-analysis

Last synced: 07 Nov 2024

https://github.com/hamelsmu/seq2seq_tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 27 Oct 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 29 Oct 2024

https://github.com/picnicml/doddle-model

:cake: doddle-model: machine learning in Scala.

breeze data-science doddle-model machine-learning scala

Last synced: 18 Nov 2024

https://github.com/gzuidhof/zarr.js

Javascript implementation of Zarr

array data-science gehlenborglab javascript typescript zarr

Last synced: 13 Nov 2024

https://github.com/ing-bank/probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.

binary-classifiers data-analysis data-science feature-elimination machine-learning multi-class-classification recursive-feature-elimination regressors shap statistics tree-model

Last synced: 15 Nov 2024

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 12 Nov 2024

https://github.com/morganjwilliams/pyrolite

A set of tools for getting the most from your geochemical data.

chemistry data-science geochemical-data geochemistry geoscience pyrolite ternary-diagrams

Last synced: 25 Oct 2024

https://github.com/njtierney/rmd4sci

Rmarkdown for Scientists

book bookdown data-science r rmarkdown rstats science

Last synced: 27 Oct 2024

https://github.com/machine-learning-apps/ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning

aml azure azure-machine-learning data-science machine-learning machine-learning-lifecycle mlops

Last synced: 02 Nov 2024

https://github.com/RamiKrispin/Introduction-to-Docker

(WIP) Getting started with Docker - An introduction to Docker with data science and engineering applications

data-engineering data-science docker dockerfile

Last synced: 25 Oct 2024

https://github.com/suji04/normalizednerd

Codes for the videos of my YouTube channel

data-science machine-learning python tutorial youtube

Last synced: 17 Nov 2024

https://github.com/bcg-x-official/artkit

Automated prompt-based testing and evaluation of Gen AI applications

asyncio data-science gen-ai genai python red-teaming test-automation

Last synced: 15 Nov 2024

https://github.com/scitime/scitime

Training time estimation for scikit-learn algorithms

data-science machine-learning python scikit-learn timer

Last synced: 01 Nov 2024

https://github.com/romanmichaelpaolucci/AI_Stock_Trading

Design pattern for critical stages in the development process of an AI Stock Trading Bot

artificial-intelligence data-science machine-learning neural-network python trading trading-algorithms trading-bot trading-strategies

Last synced: 07 Nov 2024

https://github.com/scrapinghub/python-simhash

An efficient simhash implementation for python

data-science

Last synced: 10 Nov 2024

https://github.com/datacamp/viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

airflow apache-airflow data-engineering data-science packages python workflow

Last synced: 15 Nov 2024

https://github.com/winvector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 14 Nov 2024

https://github.com/autoviml/deep_autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

autokeras automl data-science deep-learning gcp keras machine-learning mlflow mljar pycaret python tensorflow tensorflow2 tpot

Last synced: 10 Oct 2024

https://github.com/vkoul/Econ-Data-Science

Articles/ Journals and Videos related to Economics:chart_with_upwards_trend: and Data Science :bar_chart:

casual-inference data-science econometrics economics economist machine-learning social-sciences

Last synced: 13 Nov 2024

https://github.com/jadianes/spark-r-notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science exploratory-data-analysis jupyter jupyter-notebook notebook r sparkr

Last synced: 09 Nov 2024

https://github.com/napjon/krisk

Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.

dashboard data-science data-visualization echarts interactive-charts jupyter-notebook python

Last synced: 15 Nov 2024

https://github.com/ome/ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

bioimaging cloud data-science file-formats spec

Last synced: 16 Nov 2024

https://github.com/yandexdataschool/roc_comparison

The fast version of DeLong's method for computing the covariance of unadjusted AUC.

data-science statistics

Last synced: 06 Nov 2024