Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/kamu-data/kamu-cli

New generation decentralized data lake and a streaming data pipeline

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 14 Oct 2024

https://github.com/mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 15 Nov 2024

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 04 Nov 2024

https://github.com/senseyeio/roger

Golang RServe client. Use R from Go

data-science go r rserve scientific-computing

Last synced: 13 Nov 2024

https://github.com/rasgointelligence/RasgoQL

Write python locally, execute SQL in your data warehouse

data-analysis data-science pandas python sql

Last synced: 08 Aug 2024

https://github.com/swanhubx/swanlab

⚡️SwanLab: your ML experiment notebook. 你的AI实验笔记本,跟踪与可视化你的机器学习全流程

data-science deep-learning fastapi jax machine-learning mlops model-versioning python pytorch tensorboard tensorflow tracking transformers visualization

Last synced: 11 Oct 2024

https://github.com/weijie-chen/econometrics-with-python

Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

data-analysis data-science econometrics economics python statistics time-series

Last synced: 12 Nov 2024

https://github.com/datalayer/jupyter-ui

⚛️ React.js components 💯% compatible with 🪐 Jupyter. https://jupyter-ui-storybook.datalayer.tech

data data-product data-science data-visualisation datalayer ipywidgets jupyter jupyterlab lumino notebook reactjs ui

Last synced: 11 Oct 2024

https://github.com/vopani/datatableton

100 exercises to learn Python Datatable

data-science datatable pydatatable python tutorial-exercises

Last synced: 03 Aug 2024

https://github.com/svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

apache-spark data-processing data-science python

Last synced: 12 Oct 2024

https://github.com/carloocchiena/the_statistics_handbook

the statistics handbook open source repository

data-science latex mathematics statistics

Last synced: 01 Nov 2024

https://github.com/wizardforcel/data-science-notebook

:book: 每一个伟大的思想和行动都有一个微不足道的开始

data-analysis data-science machine-learning notebook numpy pandas sklearn tensorflow

Last synced: 10 Oct 2024

https://github.com/kde/labplot

LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

data-analysis data-science data-visualization fitting graph graph2d plotting scientific-plotting scientific-visualization

Last synced: 11 Nov 2024

https://github.com/scrapinghub/webstruct

NER toolkit for HTML data

crfsuite data-science ner

Last synced: 10 Nov 2024

https://github.com/uclatommy/tweetfeels

Real-time sentiment analysis in Python using twitter's streaming api

data-mining data-science python-3-6 sentiment-analysis twitter

Last synced: 12 Oct 2024

https://github.com/dwhitena/gophernet

A simple from-scratch neural net written in Go

artificial-intelligence data-science go golang machine-learning neural-network

Last synced: 11 Nov 2024

https://github.com/khanhnamle1994/statistical-learning

Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course

data-mining data-science r regression statistical-learning

Last synced: 10 Nov 2024

https://github.com/cartodb/cartoframes

CARTO Python package for data scientists

carto data-science jupyter-notebook maps python spatial-data-analysis

Last synced: 14 Nov 2024

https://github.com/empower-ai/dsensei

AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.

analytics business-analytics business-intelligence data data-analytics data-insights data-science

Last synced: 14 Nov 2024

https://github.com/tirthajyoti/uci-ml-api

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

api classification clustering data-science learning machine-learning python regression statistics uci-machine-learning

Last synced: 14 Nov 2024

https://github.com/red-data-tools/unicode_plot.rb

Plot your data by Unicode characters

data-science data-visualization ruby

Last synced: 12 Nov 2024

https://github.com/analysiscenter/cardio

CardIO is a library for data science research of heart signals

data-science deep-learning deep-neural-networks healthcare machine-learning python

Last synced: 13 Nov 2024

https://github.com/ropensci/elastic

R client for the Elasticsearch HTTP API

data-science database database-wrapper elasticsearch etl http json r r-package rstats

Last synced: 15 Nov 2024

https://github.com/bears-r-us/arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

chapel data data-analysis data-science distributed-computing eda hpc python

Last synced: 14 Nov 2024

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 28 Oct 2024

https://github.com/justmarkham/trump-lies

Tutorial: Web scraping in Python with Beautiful Soup

beautiful-soup data-science dataset pandas python requests tutorial web-scraping

Last synced: 14 Nov 2024

https://github.com/Griperis/BlenderDataVis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 03 Aug 2024

https://github.com/durgeshsamariya/data-science-roadmap

Roadmap to learn Data Science and related areas.

data-science data-science-resources learn-data-science roadmap

Last synced: 08 Nov 2024

https://github.com/shreyashankar/datasets-for-good

List of datasets to apply stats/machine learning/technology to the world of social good.

data-science dataset education environment government health machine-learning social-good

Last synced: 13 Nov 2024

https://github.com/dialnd/imbalanced-algorithms

Python-based implementations of algorithms for learning on imbalanced data.

data-science imbalanced-data machine-learning notre-dame python

Last synced: 07 Nov 2024

https://github.com/recodehive/stackoverflow-analysis

Stack overflow is a professional community for developers. This repo analysis 3 years of developer Survey done by Stackoverflow and do visualization and predict the salary of Data Scientist in future.

canva collaborate data-analysis data-science data-visualization ghdesktop github github-pages machine-learning stack-overflow student-vscode survey-analysis vscode

Last synced: 15 Nov 2024

https://github.com/voxel51/voxelgpt

AI assistant that can query visual datasets, search the FiftyOne docs, and answer general computer vision questions

artificial-intelligence chatgpt computer-vision data-science deep-learning fiftyone langchain llm machine-learning openai python

Last synced: 09 Nov 2024

https://github.com/bgruening/docker-galaxy-stable

:whale::bar_chart::books: Docker Images tracking the stable Galaxy releases.

data-science docker-image galaxy galaxyproject science

Last synced: 09 Nov 2024

https://github.com/koalaverse/homlr

Supplementary material for Hands-On Machine Learning with R, an applied book covering the fundamentals of machine learning with R.

data-science machine-learning r supervised-learning unsupervised-learning

Last synced: 26 Oct 2024

https://github.com/Minyus/pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

data-engineering data-science deep-learning experimentation machine-learning pipeline

Last synced: 29 Oct 2024

https://github.com/nickslevine/zebras

Data analysis library for JavaScript built with Ramda

data-analysis data-science functional-programming javascript pandas ramda

Last synced: 07 Nov 2024

https://github.com/vertica/VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

big-data data-science data-visualization machine-learning preparation python python-library vertica

Last synced: 13 Nov 2024

https://github.com/neurodata/hyppo

Python package for multivariate hypothesis testing

data-science hacktoberfest hypothesis-testing independence ksample-testing python

Last synced: 10 Nov 2024

https://github.com/analysiscenter/radio

RadIO is a library for data science research of computed tomography imaging

computed-tomography data-science deep-learning machine-learning medical-imaging neural-networks tensorflow

Last synced: 07 Aug 2024

https://github.com/project-codeflare/codeflare

Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.

automl data-science hyperparameter-optimization machine-learning pipelines ray sklearn workflows

Last synced: 13 Oct 2024

https://github.com/vertica/verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

big-data data-science data-visualization machine-learning preparation python python-library vertica

Last synced: 12 Nov 2024

https://github.com/xlang-ai/ds-1000

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

benchmark code-generation data-science large-language-models semantic-parsing

Last synced: 13 Nov 2024

https://github.com/anki-code/xonsh-cheatsheet

Cheat sheet for xonsh shell with copy-pastable examples. The best doc for the new users.

awesome awesome-cheatsheet cheat-sheet cheat-sheets cheatsheet cheatsheets console data-science devops devops-scripts hacking shell terminal xonsh xontrib

Last synced: 11 Oct 2024

https://github.com/mukeshmithrakumar/Book_List

Python, Machine Learning, Deep Learning and Data Science Books

algorithms books data-science deep-learning free machine-learning python

Last synced: 13 Nov 2024

https://github.com/zeno-ml/zeno

AI Data Management & Evaluation Platform

ai data-science evaluation evaluation-framework machine-learning python

Last synced: 09 Nov 2024

https://github.com/ocademy-ai/machine-learning

Learn AI together, for free. AI learning and teaching resources for everyone.

ai data-engineering data-science deep-learning jupyter jupyter-notebook machine-learning ml mlops python scikit-learn visualization

Last synced: 08 Nov 2024

https://github.com/fastverse/fastverse

An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

c cpp data-aggregation data-manipulation data-science data-transformation high-performance low-dependency panel-data r rstats statistical-computing time-series weights

Last synced: 13 Nov 2024

https://github.com/jldbc/coffee-quality-database

Building the Coffee Quality Institute Database

agriculture coffee data data-science dataset

Last synced: 09 Nov 2024

https://github.com/shaildeliwala/delbot

It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.

ai bot bots chatbot data-science flask natural-language-processing python

Last synced: 04 Aug 2024

https://github.com/google-aai/sc17

SuperComputing 2017 Deep Learning Tutorial

data-science deep-learning google-cloud-platform machine-learning tutorial

Last synced: 07 Aug 2024

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 12 Nov 2024

https://github.com/tmthyjames/achoo

Achoo uses a Raspberry Pi to predict if my son will need his inhaler on any given day using weather, pollen, and air quality data. If the prediction for a given day is above a specified threshold, the Pi will email his school nurse, and myself, notifying her that he may need preemptive treatment. Community-sourced health monitoring!

air-quality-data data-science diy pollen prediction python r raspberry-pi weather

Last synced: 26 Oct 2024

https://github.com/data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws aws-glue aws-lake-formation aws-s3 data data-science etl-framework lakeformation lakehouse redshift

Last synced: 13 Aug 2024

https://github.com/ashutosh1919/truvisory

This project is meant to provide resources to users who want to access good LinkedIn posts which contains resources to learn any Technology, Design, Self-Branding, Motivation etc. You can visit project by:

career data-science design full-stack linkedin linkedin-posts linkedin-profile marketing motivation opensource react react-template reactjs self-branding stacks ui-design website-design website-template

Last synced: 26 Oct 2024

https://github.com/Fixy-TR/fixy

Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.

acikhack2 ai artificial-intelligence bert data-science deep-learning deeplearning keras natural-language-processing neural-network neural-networks nlp python

Last synced: 12 Nov 2024

https://github.com/saimadhu-polamuri/DataAspirant_codes

Complete machine learning model codes

data-mining data-science machine-learning python

Last synced: 05 Aug 2024

https://github.com/xlang-ai/DS-1000

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

benchmark code-generation data-science large-language-models semantic-parsing

Last synced: 09 Aug 2024

https://github.com/jgoerner/data-science-stack-cookiecutter

🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)

airflow apistar cookiecutter data-science docker docker-image jupyter minio postgres python superset

Last synced: 31 Oct 2024

https://github.com/blobcity/python-for-data-science

A collection of Jupyter Notebooks for learning Python for Data Science.

data-science jupyter jupyter-notebook jupyter-notebooks learn-python python

Last synced: 13 Nov 2024

https://github.com/mvlearn/mvlearn

Python package for multi-view machine learning

data-science machine-learning multiview-learning python

Last synced: 12 Nov 2024

https://github.com/Laurae2/Laurae

Advanced High Performance Data Science Toolbox for R by Laurae

data-science laurae machine-learning r supervised-learning xgboost

Last synced: 07 Aug 2024

https://github.com/Speedml/speedml

Speedml is a Python package to speed start machine learning projects.

data-science machine-learning python

Last synced: 03 Aug 2024

https://github.com/PecanProject/pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants r

Last synced: 14 Nov 2024

https://github.com/danaugrs/go-tsne

t-Distributed Stochastic Neighbor Embedding (t-SNE) in Go

3d data-science dimensionality-reduction go machine-learning tsne unsupervised-learning visualization

Last synced: 12 Nov 2024