Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/red-data-tools/unicode_plot.rb

Plot your data by Unicode characters

data-science data-visualization ruby

Last synced: 02 Aug 2024

https://github.com/analysiscenter/cardio

CardIO is a library for data science research of heart signals

data-science deep-learning deep-neural-networks healthcare machine-learning python

Last synced: 02 Aug 2024

https://github.com/Griperis/BlenderDataVis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 03 Aug 2024

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 31 Jul 2024

https://github.com/justmarkham/trump-lies

Tutorial: Web scraping in Python with Beautiful Soup

beautiful-soup data-science dataset pandas python requests tutorial web-scraping

Last synced: 02 Aug 2024

https://github.com/dialnd/imbalanced-algorithms

Python-based implementations of algorithms for learning on imbalanced data.

data-science imbalanced-data machine-learning notre-dame python

Last synced: 01 Aug 2024

https://github.com/voxel51/voxelgpt

AI assistant that can query visual datasets, search the FiftyOne docs, and answer general computer vision questions

artificial-intelligence chatgpt computer-vision data-science deep-learning fiftyone langchain llm machine-learning openai python

Last synced: 02 Aug 2024

https://github.com/Minyus/pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

data-engineering data-science deep-learning experimentation machine-learning pipeline

Last synced: 31 Jul 2024

https://github.com/bgruening/docker-galaxy-stable

:whale::bar_chart::books: Docker Images tracking the stable Galaxy releases.

data-science docker-image galaxy galaxyproject science

Last synced: 01 Aug 2024

https://github.com/koalaverse/homlr

Supplementary material for Hands-On Machine Learning with R, an applied book covering the fundamentals of machine learning with R.

data-science machine-learning r supervised-learning unsupervised-learning

Last synced: 31 Jul 2024

https://github.com/nickslevine/zebras

Data analysis library for JavaScript built with Ramda

data-analysis data-science functional-programming javascript pandas ramda

Last synced: 01 Aug 2024

https://github.com/analysiscenter/radio

RadIO is a library for data science research of computed tomography imaging

computed-tomography data-science deep-learning machine-learning medical-imaging neural-networks tensorflow

Last synced: 07 Aug 2024

https://github.com/vertica/VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

big-data data-science data-visualization machine-learning preparation python python-library vertica

Last synced: 02 Aug 2024

https://github.com/project-codeflare/codeflare

Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.

automl data-science hyperparameter-optimization machine-learning pipelines ray sklearn workflows

Last synced: 02 Aug 2024

https://github.com/anki-code/xonsh-cheatsheet

Cheat sheet for xonsh shell with copy-pastable examples. The best doc for the new users.

awesome awesome-cheatsheet cheat-sheet cheat-sheets cheatsheet cheatsheets console data-science devops devops-scripts hacking shell terminal xonsh xontrib

Last synced: 31 Jul 2024

https://github.com/mukeshmithrakumar/Book_List

Python, Machine Learning, Deep Learning and Data Science Books

algorithms books data-science deep-learning free machine-learning python

Last synced: 02 Aug 2024

https://github.com/neurodata/hyppo

Python package for multivariate hypothesis testing

data-science hacktoberfest hypothesis-testing independence ksample-testing python

Last synced: 02 Aug 2024

https://github.com/zeno-ml/zeno

AI Data Management & Evaluation Platform

ai data-science evaluation evaluation-framework machine-learning python

Last synced: 01 Aug 2024

https://github.com/fastverse/fastverse

An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

c cpp data-aggregation data-manipulation data-science data-transformation high-performance low-dependency panel-data r rstats statistical-computing time-series weights

Last synced: 05 Aug 2024

https://github.com/ocademy-ai/machine-learning

Learn AI together, for free. AI learning and teaching resources for everyone.

ai data-engineering data-science deep-learning jupyter jupyter-notebook machine-learning ml mlops python scikit-learn visualization

Last synced: 01 Aug 2024

https://github.com/shaildeliwala/delbot

It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.

ai bot bots chatbot data-science flask natural-language-processing python

Last synced: 04 Aug 2024

https://github.com/google-aai/sc17

SuperComputing 2017 Deep Learning Tutorial

data-science deep-learning google-cloud-platform machine-learning tutorial

Last synced: 07 Aug 2024

https://github.com/data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws aws-glue aws-lake-formation aws-s3 data data-science etl-framework lakeformation lakehouse redshift

Last synced: 13 Aug 2024

https://github.com/saimadhu-polamuri/DataAspirant_codes

Complete machine learning model codes

data-mining data-science machine-learning python

Last synced: 05 Aug 2024

https://github.com/xlang-ai/DS-1000

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

benchmark code-generation data-science large-language-models semantic-parsing

Last synced: 09 Aug 2024

https://github.com/Fixy-TR/fixy

Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.

acikhack2 ai artificial-intelligence bert data-science deep-learning deeplearning keras natural-language-processing neural-network neural-networks nlp python

Last synced: 02 Aug 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 02 Aug 2024

https://github.com/jgoerner/data-science-stack-cookiecutter

🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)

airflow apistar cookiecutter data-science docker docker-image jupyter minio postgres python superset

Last synced: 31 Jul 2024

https://github.com/blobcity/python-for-data-science

A collection of Jupyter Notebooks for learning Python for Data Science.

data-science jupyter jupyter-notebook jupyter-notebooks learn-python python

Last synced: 02 Aug 2024

https://github.com/Laurae2/Laurae

Advanced High Performance Data Science Toolbox for R by Laurae

data-science laurae machine-learning r supervised-learning xgboost

Last synced: 07 Aug 2024

https://github.com/Speedml/speedml

Speedml is a Python package to speed start machine learning projects.

data-science machine-learning python

Last synced: 03 Aug 2024

https://github.com/danaugrs/go-tsne

t-Distributed Stochastic Neighbor Embedding (t-SNE) in Go

3d data-science dimensionality-reduction go machine-learning tsne unsupervised-learning visualization

Last synced: 02 Aug 2024

https://github.com/nteract/bookstore

📚 Notebook storage and publishing workflows for the masses

data-science notebook nteract scheduling storage versioned-buckets

Last synced: 01 Aug 2024

https://github.com/mvlearn/mvlearn

Python package for multi-view machine learning

data-science machine-learning multiview-learning python

Last synced: 02 Aug 2024

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 02 Aug 2024

https://github.com/agilescientific/striplog

Lithology and stratigraphic logs for wells or outcrop.

data-mining data-science geology petrophysics sedimentology swung-stack

Last synced: 30 Jul 2024

https://github.com/ideonate/cdsdashboards

JupyterHub extension for ContainDS Dashboards

bokeh data-science jupyter jupyterhub panel plotly-dash rshiny streamlit visualization

Last synced: 01 Aug 2024

https://github.com/PecanProject/pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants r

Last synced: 03 Aug 2024

https://github.com/aws/amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.

amazon-redshift aws-redshift data-analysis data-science

Last synced: 01 Aug 2024

https://github.com/analysiscenter/batchflow

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

data-science machine-learning pipeline pipeline-framework python python3 workflow workflow-engine

Last synced: 01 Aug 2024

https://github.com/alan-turing-institute/skpro

A unified framework for tabular probabilistic regression and probability distributions in python

ai data-science framework machine-learning prediction probabilistic-models probability-distributions python regression sklearn

Last synced: 29 Jul 2024

https://github.com/Toloka/crowd-kit

Control the quality of your labeled data with the Python tools you already know.

aggregations annotation crowd crowdsourcing data-mining data-science labeling python quality-control toloka truth-inference

Last synced: 31 Jul 2024

https://github.com/launchflow/buildflow

BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.

batch data-science pipeline python streaming

Last synced: 06 Aug 2024

https://github.com/ActivitySim/activitysim

An Open Platform for Activity-Based Travel Modeling

activitysim bsd-3-clause data-science microsimulation python travel-modeling

Last synced: 31 Jul 2024

https://github.com/TMiguelT/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 07 Aug 2024

https://github.com/seg/2016-ml-contest

Machine learning contest - October 2016 TLE

contest data-science fun geophysics geoscience machine-learning

Last synced: 07 Aug 2024

https://github.com/multimeric/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 02 Aug 2024

https://github.com/nshiab/simple-data-analysis.js

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 12 Aug 2024

https://github.com/robmarkcole/HASS-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 01 Aug 2024

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 31 Jul 2024

https://github.com/microsoft/finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

business data-science feature-selection finance finnts forecasting machine-learning microsoft r r-package rstats time-series

Last synced: 13 Aug 2024

https://github.com/swoop-inc/spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

data-engineering data-science scala spark

Last synced: 06 Aug 2024

https://github.com/coqui-ai/Trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 07 Aug 2024

https://github.com/Azure/DataScienceVM

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 08 Aug 2024

https://github.com/capeprivacy/cape-python

Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.

collaboration data-science hacktoberfest machine-learning pandas policy privacy python spark

Last synced: 03 Aug 2024

https://github.com/Oxen-AI/Oxen

Oxen.ai's core rust library, server, and CLI

artificial-intelligence data-science database machine-learning version-control

Last synced: 17 Aug 2024

https://github.com/fedora-infra/fedmsg

Federated Messaging with ZeroMQ

data-science fedora-project message-bus python zeromq

Last synced: 20 Aug 2024

https://github.com/kdr-aus/ogma

Scripting language focused on processing tabular data.

data-science language rust scripting-language table-data

Last synced: 31 Jul 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 31 Jul 2024

https://github.com/dlab-berkeley/Python-Fundamentals-Legacy

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

data-science introduction-to-python jupyter python

Last synced: 02 Aug 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 04 Aug 2024

https://github.com/Automunge/AutoMunge

Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbations.

data-science machine-learning

Last synced: 31 Jul 2024

https://github.com/unnati-xyz/scalable-data-science-platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

data-engineer data-pipeline data-science luigi machine-learning rest-api spark

Last synced: 07 Aug 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 31 Jul 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 08 Aug 2024