Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/davendw49/k2

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

ai4science data-science geoai geoscience kg large-language-models llm

Last synced: 01 Aug 2024

https://github.com/paddymul/buckaroo

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.

buckaroo data-science jupyter paddy pandas

Last synced: 03 Aug 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 05 Aug 2024

https://github.com/voila-dashboards/voici

Voici turns any Jupyter Notebook into a static web application

dashboards data-science emscripten jupyter jupyterlite voila-dashboard wasm

Last synced: 04 Sep 2024

https://github.com/aws-samples/aws-ml-jp

SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集

aws data-science deep-learning jupyter-notebook machine-learning mlops sagemaker

Last synced: 01 Aug 2024

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 02 Aug 2024

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 01 Aug 2024

https://github.com/picnicml/doddle-model

:cake: doddle-model: machine learning in Scala.

breeze data-science doddle-model machine-learning scala

Last synced: 04 Aug 2024

https://github.com/rivasiker/ggHoriPlot

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 02 Aug 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 31 Jul 2024

https://rivasiker.github.io/ggHoriPlot/

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 02 Aug 2024

https://github.com/morganjwilliams/pyrolite

A set of tools for getting the most from your geochemical data.

chemistry data-science geochemical-data geochemistry geoscience pyrolite ternary-diagrams

Last synced: 30 Jul 2024

https://github.com/RamiKrispin/Introduction-to-Docker

(WIP) Getting started with Docker - An introduction to Docker with data science and engineering applications

data-engineering data-science docker dockerfile

Last synced: 30 Jul 2024

https://github.com/njtierney/rmd4sci

Rmarkdown for Scientists

book bookdown data-science r rmarkdown rstats science

Last synced: 02 Aug 2024

https://github.com/machine-learning-apps/ml-template-azure

Template for getting started with automated ML Ops on Azure Machine Learning

aml azure azure-machine-learning data-science machine-learning machine-learning-lifecycle mlops

Last synced: 01 Aug 2024

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 01 Aug 2024

https://github.com/vkoul/Econ-Data-Science

Articles/ Journals and Videos related to Economics:chart_with_upwards_trend: and Data Science :bar_chart:

casual-inference data-science econometrics economics economist machine-learning social-sciences

Last synced: 02 Aug 2024

https://github.com/romanmichaelpaolucci/AI_Stock_Trading

Design pattern for critical stages in the development process of an AI Stock Trading Bot

artificial-intelligence data-science machine-learning neural-network python trading trading-algorithms trading-bot trading-strategies

Last synced: 01 Aug 2024

https://github.com/napjon/krisk

Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.

dashboard data-science data-visualization echarts interactive-charts jupyter-notebook python

Last synced: 01 Aug 2024

https://github.com/WinVector/pyvtreat

vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.

data-science machine-learning pydata python

Last synced: 05 Aug 2024

https://github.com/LankyCyril/pyvenn

Python module for plotting Venn diagrams of 2..6 sets

data-science matplotlib matplotlib-venn venn venn-diagram venndiagram visualization

Last synced: 03 Aug 2024

https://github.com/medtagger/MedTagger

A collaborative framework for annotating medical datasets using crowdsourcing.

crowdsourcing data-science data-validation deep-learning labeling medical-imaging

Last synced: 03 Aug 2024

https://github.com/ColtAllen/btyd

Buy Till You Die and Customer Lifetime Value statistical models in Python.

bayesian buy-til-you-die customer-lifetime-value data-science python

Last synced: 02 Aug 2024

https://github.com/alexandervnikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets)

augmentations data-augmentation data-science datasets deep-learning generative-model keras machine-learning python synthetic-data synthetic-time-series tensorflow2 time-series vae

Last synced: 01 Aug 2024

https://github.com/innat/ML-Resource

A concise resource repository for machine learning

data-analysis data-science deep-learning kaggle machine-learning python spark

Last synced: 02 Aug 2024

https://github.com/NicholasMamo/multiplex-plot

Multiplex: visualizations that tell stories—A Python library to create and annotate beautiful network graph visualizations, text visualizations and more.

data-science data-visualisation graph-visualization graphs information-retrieval matplotlib natural-language-processing network-visualization python text-mining text-visualisation text-visualization visualisation visualizations viz vizualisation

Last synced: 07 Aug 2024

https://github.com/lawmurray/Birch

A probabilistic programming language that combines automatic differentiation, automatic marginalization, and automatic conditioning within Monte Carlo methods.

autodiff bayesian bayesian-inference bayesian-methods bayesian-statistics data-science machine-learning machine-learning-algorithms machine-learning-projects monte-carlo-methods monte-carlo-sampling probabilistic-programming-languages statistics

Last synced: 31 Jul 2024

https://github.com/formlio/forml

ForML - A development framework and MLOps platform for the lifecycle management of data science projects

ai data-science machine-learning ml mlops portability python reproducibility

Last synced: 03 Aug 2024

https://github.com/ome/ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

bioimaging cloud data-science file-formats spec

Last synced: 03 Aug 2024

https://github.com/senderle/topic-modeling-tool

A point-and-click tool for creating and analyzing topic models produced by MALLET.

data-science digital-humanities mallet text-analytics topic-modeling

Last synced: 02 Aug 2024

https://github.com/dssg/MLforPublicPolicy

Class resources for CAPP 30254 (Machine Learning for Public Policy)

data-science machine-learning public-policy

Last synced: 31 Jul 2024

https://github.com/nischalshrestha/Unravel

A fluent code explorer for R. 🔍

data-science datawrangling dplyr r rstats shiny tidyr tidyverse

Last synced: 13 Aug 2024

https://github.com/mc2-project/secure-xgboost

Secure collaborative training and inference for XGBoost.

collaborative-learning data-science enclave machine-learning privacy security xgboost

Last synced: 02 Aug 2024

https://github.com/oracle/macest

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

confidence-estimation data-science machine-learning python

Last synced: 03 Aug 2024

https://github.com/wyattowalsh/data-science-notes

Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)

calculus classification compilation crowdsourcing data-science first-timers first-timers-only jupyter-book linear-algebra machine-learning modeling probability regression simulation statistics up-for-grabs visualization

Last synced: 03 Aug 2024

https://github.com/AlexIoannides/pymc-example-project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn

Last synced: 07 Aug 2024

https://github.com/jay-johnson/sci-pype

A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.

data-science devops-for-data-science docker docker-compose ipython ipython-notebook jupyter jupyter-notebook jupyter-themes machine-learning machine-learning-api predictive python red10 redis s3 seaborn stock-price-prediction xgb xgboost

Last synced: 07 Aug 2024

https://github.com/target/data-validator

A tool to validate data, built around Apache Spark.

data-science data-validation hacktoberfest

Last synced: 01 Aug 2024

https://github.com/tlverse/sl3

💪 🤔 Modern Super Learning with Machine Learning Pipelines

data-science ensemble-learning ensemble-model machine-learning model-selection r r-package regression stacking statistics

Last synced: 02 Aug 2024

https://github.com/TexteaInc/funix

Building web apps without manually creating widgets

app-builder data-science frontend machine-learning

Last synced: 13 Aug 2024

https://github.com/IlyaGusev/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

classification clustering cpp data-science document-similarity fasttext machine-learning nlp

Last synced: 01 Aug 2024

https://github.com/jkoutsikakis/pytorch-wrapper

Provides a systematic and extensible way to build, train, evaluate, and tune deep learning models using PyTorch.

data-science deep-learning machine-learning neural-network python pytorch pytorch-wrapper tensor

Last synced: 07 Aug 2024

https://github.com/giswqs/manjaro-linux

Shell scripts for setting up Manjaro Linux for Python programming and deep learning

data-science deep-learning gis kde manjaro manjaro-linux notebook-jupyter python r remote-sensing shell-scripts tensorflow

Last synced: 06 Aug 2024

https://github.com/scottshambaugh/monaco

Quantify uncertainty and sensitivities in your computer models with an industry-grade Monte Carlo library.

data-science monaco monte-carlo python scientific-computing sensitivity-analysis simulation statistics uncertainty-analysis uncertainty-quantification

Last synced: 01 Aug 2024

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 01 Aug 2024

https://github.com/talegari/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 12 Aug 2024

https://github.com/tidypyverse/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 01 Aug 2024

https://github.com/firmai/business-analytics-and-mathematics-python-book

Advanced Business Analytics and Mathematics with Python (by @firmai)

analytics business data-analysis data-science mathematics python

Last synced: 04 Aug 2024

https://github.com/synthesized-io/fairlens

Identify bias and measure fairness of your data

bias data data-analysis data-science fairness pandas python statistics

Last synced: 03 Aug 2024