Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 30 Oct 2024

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 25 Dec 2024

https://github.com/data-describe/data-describe

data⎰describe: Pythonic EDA Accelerator for Data Science

analysis data-science eda exploratory-data-analysis pypi

Last synced: 07 Nov 2024

https://github.com/petrobras/3w

Promotes development of ML algorithms for early detection and classification of undesirable events in offshore oil wells.

anomaly-detection data-science machine-learning multivariate-time-series-analysis oil-well-monitoring

Last synced: 23 Dec 2024

https://github.com/drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

data-ethics data-science ethics machine-learning

Last synced: 21 Dec 2024

https://github.com/Dyakonov/PZAD

Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)

data-mining data-science data-visualization education lectures machine-learning ml russian slides

Last synced: 27 Nov 2024

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 27 Dec 2024

https://github.com/ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 21 Dec 2024

https://github.com/Ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 25 Nov 2024

https://github.com/microsoft/NimbusML

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 09 Nov 2024

https://github.com/microsoft/nimbusml

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 30 Sep 2024

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 02 Nov 2024

https://github.com/glm-tools/pyglmnet

Python implementation of elastic-net regularized generalized linear models

data-science elastic-net glm lasso machine-learning python

Last synced: 12 Nov 2024

https://github.com/Giorgi/DuckDB.NET

Bindings and ADO.NET Provider for DuckDB

ado-net data-science duckdb duckdb-database hacktoberfest hacktoberfest2023

Last synced: 28 Oct 2024

https://github.com/Mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 28 Oct 2024

https://github.com/mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 27 Dec 2024

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 04 Nov 2024

https://github.com/senseyeio/roger

Golang RServe client. Use R from Go

data-science go r rserve scientific-computing

Last synced: 13 Nov 2024

https://github.com/rasgointelligence/RasgoQL

Write python locally, execute SQL in your data warehouse

data-analysis data-science pandas python sql

Last synced: 27 Nov 2024

https://github.com/vopani/datatableton

100 exercises to learn Python Datatable

data-science datatable pydatatable python tutorial-exercises

Last synced: 18 Nov 2024

https://github.com/svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

apache-spark data-processing data-science python

Last synced: 27 Dec 2024

https://github.com/wizardforcel/data-science-notebook

:book: 每一个伟大的思想和行动都有一个微不足道的开始

data-analysis data-science machine-learning notebook numpy pandas sklearn tensorflow

Last synced: 25 Dec 2024

https://github.com/griperis/blenderdatavis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 24 Dec 2024

https://github.com/empower-ai/dsensei

AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.

analytics business-analytics business-intelligence data data-analytics data-insights data-science

Last synced: 25 Dec 2024

https://github.com/khanhnamle1994/statistical-learning

Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course

data-mining data-science r regression statistical-learning

Last synced: 25 Dec 2024

https://github.com/carloocchiena/the_statistics_handbook

the statistics handbook open source repository

data-science latex mathematics statistics

Last synced: 24 Dec 2024

https://github.com/kde/labplot

LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

data-analysis data-science data-visualization fitting graph graph2d plotting scientific-plotting scientific-visualization

Last synced: 25 Dec 2024

https://github.com/scrapinghub/webstruct

NER toolkit for HTML data

crfsuite data-science ner

Last synced: 10 Nov 2024

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 25 Dec 2024

https://github.com/uclatommy/tweetfeels

Real-time sentiment analysis in Python using twitter's streaming api

data-mining data-science python-3-6 sentiment-analysis twitter

Last synced: 22 Dec 2024

https://github.com/dwhitena/gophernet

A simple from-scratch neural net written in Go

artificial-intelligence data-science go golang machine-learning neural-network

Last synced: 11 Nov 2024

https://github.com/tirthajyoti/uci-ml-api

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

api classification clustering data-science learning machine-learning python regression statistics uci-machine-learning

Last synced: 25 Dec 2024

https://github.com/cartodb/cartoframes

CARTO Python package for data scientists

carto data-science jupyter-notebook maps python spatial-data-analysis

Last synced: 27 Dec 2024

https://github.com/red-data-tools/unicode_plot.rb

Plot your data by Unicode characters

data-science data-visualization ruby

Last synced: 24 Dec 2024

https://github.com/analysiscenter/cardio

CardIO is a library for data science research of heart signals

data-science deep-learning deep-neural-networks healthcare machine-learning python

Last synced: 13 Nov 2024

https://github.com/Griperis/BlenderDataVis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 16 Nov 2024

https://github.com/dgerlanc/programming-with-data

🐍 Learn Python and Pandas from the ground up

dangerlanc data-science pandas pandas-tutorial python workshop

Last synced: 23 Dec 2024

https://github.com/ropensci/elastic

R client for the Elasticsearch HTTP API

data-science database database-wrapper elasticsearch etl http json r r-package rstats

Last synced: 24 Dec 2024

https://github.com/justmarkham/trump-lies

Tutorial: Web scraping in Python with Beautiful Soup

beautiful-soup data-science dataset pandas python requests tutorial web-scraping

Last synced: 24 Dec 2024

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 28 Oct 2024

https://github.com/Bears-R-Us/arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

chapel data data-analysis data-science distributed-computing eda hpc python

Last synced: 20 Nov 2024

https://github.com/bears-r-us/arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:

chapel data data-analysis data-science distributed-computing eda hpc python

Last synced: 23 Dec 2024

https://github.com/jldbc/coffee-quality-database

Building the Coffee Quality Institute Database

agriculture coffee data data-science dataset

Last synced: 25 Dec 2024

https://github.com/durgeshsamariya/data-science-roadmap

Roadmap to learn Data Science and related areas.

data-science data-science-resources learn-data-science roadmap

Last synced: 08 Nov 2024

https://github.com/recodehive/stackoverflow-analysis

Stack overflow is a professional community for developers. This repo analysis 3 years of developer Survey done by Stackoverflow and do visualization and predict the salary of Data Scientist in future.

canva collaborate data-analysis data-science data-visualization ghdesktop github github-pages machine-learning stack-overflow student-vscode survey-analysis vscode

Last synced: 21 Dec 2024

https://github.com/data-dot-all/dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws aws-glue aws-lake-formation aws-s3 data data-science etl-framework lakeformation lakehouse redshift

Last synced: 04 Dec 2024

https://github.com/shreyashankar/datasets-for-good

List of datasets to apply stats/machine learning/technology to the world of social good.

data-science dataset education environment government health machine-learning social-good

Last synced: 13 Nov 2024

https://github.com/dialnd/imbalanced-algorithms

Python-based implementations of algorithms for learning on imbalanced data.

data-science imbalanced-data machine-learning notre-dame python

Last synced: 07 Nov 2024

https://github.com/koalaverse/homlr

Supplementary material for Hands-On Machine Learning with R, an applied book covering the fundamentals of machine learning with R.

data-science machine-learning r supervised-learning unsupervised-learning

Last synced: 25 Dec 2024

https://github.com/paddymul/buckaroo

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.

buckaroo data-science jupyter paddy pandas

Last synced: 21 Dec 2024

https://github.com/voxel51/voxelgpt

AI assistant that can query visual datasets, search the FiftyOne docs, and answer general computer vision questions

artificial-intelligence chatgpt computer-vision data-science deep-learning fiftyone langchain llm machine-learning openai python

Last synced: 09 Nov 2024

https://github.com/bgruening/docker-galaxy-stable

:whale::bar_chart::books: Docker Images tracking the stable Galaxy releases.

data-science docker-image galaxy galaxyproject science

Last synced: 09 Nov 2024

https://github.com/xlang-ai/ds-1000

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

benchmark code-generation data-science large-language-models semantic-parsing

Last synced: 25 Dec 2024

https://github.com/project-codeflare/codeflare

Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.

automl data-science hyperparameter-optimization machine-learning pipelines ray sklearn workflows

Last synced: 21 Dec 2024

https://github.com/bgruening/docker-galaxy

:whale::bar_chart::books: Docker Images tracking the stable Galaxy releases.

data-science docker-image galaxy galaxyproject science

Last synced: 23 Dec 2024

https://github.com/mukeshmithrakumar/book_list

Python, Machine Learning, Deep Learning and Data Science Books

algorithms books data-science deep-learning free machine-learning python

Last synced: 25 Dec 2024

https://github.com/xlang-ai/DS-1000

[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".

benchmark code-generation data-science large-language-models semantic-parsing

Last synced: 29 Nov 2024

https://github.com/Minyus/pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

data-engineering data-science deep-learning experimentation machine-learning pipeline

Last synced: 29 Oct 2024

https://github.com/neurodata/hyppo

Python package for multivariate hypothesis testing

data-science hacktoberfest hypothesis-testing independence ksample-testing python

Last synced: 22 Dec 2024

https://github.com/nickslevine/zebras

Data analysis library for JavaScript built with Ramda

data-analysis data-science functional-programming javascript pandas ramda

Last synced: 07 Nov 2024

https://github.com/analysiscenter/radio

RadIO is a library for data science research of computed tomography imaging

computed-tomography data-science deep-learning machine-learning medical-imaging neural-networks tensorflow

Last synced: 27 Nov 2024

https://github.com/vertica/VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

big-data data-science data-visualization machine-learning preparation python python-library vertica

Last synced: 13 Nov 2024