Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/sicara/sicarator

Instant Setup & Best Quality for Data Projects!

data-science generator machine-learning python

Last synced: 28 Dec 2024

https://github.com/t04glovern/selfie2anime

Anime2Selfie Backend Services - Lambda, Queue, API Gateway and traffic processing

aws aws-lambda data-science selfie2anime serverless

Last synced: 19 Dec 2024

https://github.com/capeprivacy/cape-dataframes

Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.

collaboration data-science hacktoberfest machine-learning pandas policy privacy python spark

Last synced: 14 Nov 2024

https://github.com/Oxen-AI/Oxen

Oxen.ai's core rust library, server, and CLI

artificial-intelligence data-science database machine-learning version-control

Last synced: 09 Dec 2024

https://github.com/kdr-aus/ogma

Scripting language focused on processing tabular data.

data-science language rust scripting-language table-data

Last synced: 30 Oct 2024

https://github.com/curiousily/machine-learning-from-scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 26 Dec 2024

https://github.com/fedora-infra/fedmsg

Federated Messaging with ZeroMQ

data-science fedora-project message-bus python zeromq

Last synced: 25 Dec 2024

https://github.com/pydatablog/python-for-data-science

A blog for data analytics using data science technologies

data-science finance python

Last synced: 19 Dec 2024

https://github.com/apachecn/ds-ai-tech-notes

:book: [译] 数据科学和人工智能技术笔记

ai data-science matplotlib notes numpy python sklearn

Last synced: 18 Dec 2024

https://github.com/tirthajyoti/ds-with-pysimplegui

Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package

analytics application artificial-intelligence data-science desktop-app gui machine-learning python windows

Last synced: 19 Dec 2024

https://github.com/dlab-berkeley/Python-Fundamentals-Legacy

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

data-science introduction-to-python jupyter python

Last synced: 11 Nov 2024

https://github.com/hugohadfield/kalmangrad

Automated, smooth, N'th order derivatives of non-uniformly sampled time series data

data-science derivatives kalman-filter signal-processing smoothing

Last synced: 23 Oct 2024

https://github.com/lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-spark data-science databricks scala

Last synced: 23 Dec 2024

https://github.com/Automunge/AutoMunge

Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbations.

data-science machine-learning

Last synced: 27 Oct 2024

https://github.com/unnati-xyz/scalable-data-science-platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

data-engineer data-pipeline data-science luigi machine-learning rest-api spark

Last synced: 27 Nov 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 29 Sep 2024

https://github.com/solegalli/machine-learning-imbalanced-data

Code repository for the online course Machine Learning with Imbalanced Data

data-science imbalanced-classification imbalanced-data imbalanced-learning machine-learning python

Last synced: 22 Dec 2024

https://github.com/ahammadmejbah/machine-learning-book-collections

Machine learning is the study and development of data-driven strategies to enhance task performance. AI includes it.

data-science deep-learning machine-learning

Last synced: 11 Nov 2024

https://github.com/robb/rbbjson

Flexible JSON traversal for rapid prototyping.

data-science json jsonpath prototyping swift

Last synced: 27 Oct 2024

https://github.com/davendw49/k2

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

ai4science data-science geoai geoscience kg large-language-models llm

Last synced: 02 Nov 2024

https://github.com/hamelsmu/docker_tutorial

Code and helper scripts for article on Medium "How Docker Can Help You Become A More Effective Data Scientist"

data-science docker docker-tutorial medium medium-article

Last synced: 27 Oct 2024

https://github.com/phillipdupuis/dtale-desktop

Build a data visualization dashboard with simple snippets of python code

data-analysis data-science data-visualization fastapi pandas python react typescript visualization

Last synced: 27 Dec 2024

https://github.com/risenw/datasist

A Python library for easy data analysis, visualization, exploration and modeling

data-analysis data-science data-visualization feature-engineering machine-learning python-3

Last synced: 29 Dec 2024

https://github.com/pyscaffold/pyscaffoldext-dsproject

💫 PyScaffold extension for data-science projects

data-science pyscaffold pyscaffold-extension python

Last synced: 29 Dec 2024

https://github.com/anthdm/ml-email-clustering

Email clustering with machine learning

clustering data-science machine-learning scikit-learn

Last synced: 19 Nov 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 27 Nov 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 27 Oct 2024

https://github.com/thebabylonai/babylog

A lightweight logger for machine learning teams to log images and predictions in production.

computer-vision cvops data-science logger logging-library machine-learning ml mlops python python3

Last synced: 25 Dec 2024

https://github.com/heidelbergcement/hcrystalball

A library that unifies the API for most commonly used libraries and modeling techniques for time-series forecasting in the Python ecosystem.

cross-validation data-science fbprophet model-selection pmdarima sarimax sklearn sklearn-api sklearn-compatible sklearn-library sktime statsmodels tbats time-series time-series-forecasting transformer wrapper

Last synced: 28 Dec 2024

https://github.com/oxinabox/datadeps.jl

reproducible data setup for reproducible science

data data-science open-science

Last synced: 20 Nov 2024

https://github.com/h2oai/wave-apps

Sample AI Apps built with H2O Wave.

data-science h2oai hacktoberfest low-code machine-learning python3

Last synced: 28 Dec 2024

https://github.com/whitews/flowkit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces

cytometry data-science fcs fcs-files flow-cytometry flow-cytometry-analysis flowjo gatingml immunology python

Last synced: 22 Dec 2024

https://github.com/emilhvitfeldt/r-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 18 Dec 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 22 Nov 2024

https://github.com/voila-dashboards/voici

Voici turns any Jupyter Notebook into a static web application

dashboards data-science emscripten jupyter jupyterlite voila-dashboard wasm

Last synced: 29 Dec 2024

https://github.com/mybridge/learn-python

Python Top 45 Articles of 2017

algorithm data-science machine-learning python python3

Last synced: 07 Nov 2024

https://rivasiker.github.io/ggHoriPlot/

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 13 Nov 2024

https://github.com/rivasiker/ggHoriPlot

A user-friendly, highly customizable R package for building horizon plots in ggplot2

data-science data-visualization ggplot2 horizon-plots r r-package

Last synced: 12 Nov 2024

https://github.com/aws-samples/aws-ml-jp

SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集

aws data-science deep-learning jupyter-notebook machine-learning mlops sagemaker

Last synced: 08 Nov 2024

https://github.com/alexandervnikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets)

augmentations data-augmentation data-science datasets deep-learning generative-model keras machine-learning python synthetic-data synthetic-time-series tensorflow2 time-series vae

Last synced: 25 Dec 2024

https://github.com/apache/incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

ai airflow big-data data-science machine-learning ml workflows

Last synced: 01 Oct 2024

https://github.com/arabacibahadir/sup-res

A great companion for finding key support and resistance levels on financial charts, cryptocurrencies.

algotrade analysis binance binance-api bitcoin cryptocurrency data-science finance pandas pinescript python stock telegram telegram-bot tradingview

Last synced: 27 Oct 2024

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 11 Nov 2024

https://github.com/jupyterhub/repo2docker-action

A GitHub action to build data science environment images with repo2docker and push them to registries.

actions binder data-science datascience docker jupyter jupyter-notebook repo2docker repo2docker-action

Last synced: 23 Dec 2024

https://github.com/rk2900/drsa

Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.

data-science deep-learning machine-learning survival-analysis

Last synced: 07 Nov 2024

https://github.com/celebi-pkg/flight-analysis

Python package to scrape flight data from Google Flights and analyzes prices. Can determine optimal flight from date, place, and price

data-science google pandas planes prediction price-tracker python

Last synced: 28 Dec 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 29 Oct 2024

https://github.com/gzuidhof/zarr.js

Javascript implementation of Zarr

array data-science gehlenborglab javascript typescript zarr

Last synced: 29 Dec 2024

https://github.com/hamelsmu/seq2seq_tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 27 Oct 2024

https://github.com/picnicml/doddle-model

:cake: doddle-model: machine learning in Scala.

breeze data-science doddle-model machine-learning scala

Last synced: 18 Nov 2024

https://github.com/jacobgil/confidenceinterval

The long missing library for python confidence intervals

data-science machine-learning metrics statistics

Last synced: 23 Dec 2024

https://github.com/ing-bank/probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.

binary-classifiers data-analysis data-science feature-elimination machine-learning multi-class-classification recursive-feature-elimination regressors shap statistics tree-model

Last synced: 28 Dec 2024