Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/nischalshrestha/Unravel

A fluent code explorer for R. 🔍

data-science datawrangling dplyr r rstats shiny tidyr tidyverse

Last synced: 13 Aug 2024

https://github.com/AlexIoannides/pymc-example-project

Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.

bayesian-data-analysis bayesian-inference data-science machine-learning numpy pandas probabilistic-programming pymc3 python scikit-learn

Last synced: 07 Aug 2024

https://github.com/target/data-validator

A tool to validate data, built around Apache Spark.

data-science data-validation hacktoberfest

Last synced: 05 Nov 2024

https://github.com/wyattowalsh/data-science-notes

Open-source project hosted at https://makeuseofdata.com to crowdsource a robust collection of notes related to data science (math, visualization, modeling, etc)

calculus classification compilation crowdsourcing data-science first-timers first-timers-only jupyter-book linear-algebra machine-learning modeling probability regression simulation statistics up-for-grabs visualization

Last synced: 03 Aug 2024

https://github.com/jay-johnson/sci-pype

A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.

data-science devops-for-data-science docker docker-compose ipython ipython-notebook jupyter jupyter-notebook jupyter-themes machine-learning machine-learning-api predictive python red10 redis s3 seaborn stock-price-prediction xgb xgboost

Last synced: 11 Oct 2024

https://github.com/facultyai/lens

Summarise and explore Pandas DataFrames

dask data-exploration data-science data-visualisation dataframe pandas

Last synced: 08 Nov 2024

https://github.com/oracle/macest

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

confidence-estimation data-science machine-learning python

Last synced: 06 Nov 2024

https://github.com/scottshambaugh/monaco

Quantify uncertainty and sensitivities in your computer models with an industry-grade Monte Carlo library.

data-science monaco monte-carlo python scientific-computing sensitivity-analysis simulation statistics uncertainty-analysis uncertainty-quantification

Last synced: 05 Nov 2024

https://github.com/tlverse/sl3

💪 🤔 Modern Super Learning with Machine Learning Pipelines

data-science ensemble-learning ensemble-model machine-learning model-selection r r-package regression stacking statistics

Last synced: 11 Nov 2024

https://github.com/danmorales/cursods_profdanilo

Códigos Python com diferentes aplicações como técnicas de machine learning e deep learning, fundamentos de estatística, problemas de regressão de classificação. Os vídeos com as explicações teóricas estão disponíveis no meu canal do YouTube

aprendizado-de-maquina ciencia-de-dados data-science deep-learning keras-classification-models keras-layer keras-models keras-neural-networks machine-learning machine-learning-algorithms numpy pandas-dataframe pandas-python scikit-learn scikitlearn-machine-learning scipy tensorflow tensorflow-tutorials

Last synced: 10 Oct 2024

https://github.com/contextlab/storytelling-with-data

Course materials for Dartmouth Course: Storytelling with Data (PSYC 81.09).

course-materials data-science data-stories python tutorials

Last synced: 06 Nov 2024

https://github.com/ieshreya/data-science-resources

Free self-taught educational resources for Data Science! I'm currently learning Data Science. I build this repository for helping myself. But if it helps you anyhow, feel free to star it!

computer-science data-science python resources

Last synced: 14 Oct 2024

https://github.com/TexteaInc/funix

Building web apps without manually creating widgets

app-builder data-science frontend machine-learning

Last synced: 13 Aug 2024

https://github.com/ideos/gloe

A general-purpose library designed to guide developers in expressing their code as a flow.

clean-code data-science flow functional-programming machine-learning python typing

Last synced: 30 Oct 2024

https://github.com/outerbounds/dsbook

Code samples for the Effective Data Science Infrastructure book

data-science infrastructure machine-learning

Last synced: 10 Nov 2024

https://github.com/IlyaGusev/tgcontest

Telegram Data Clustering contest solution by Mindful Squirrel

classification clustering cpp data-science document-similarity fasttext machine-learning nlp

Last synced: 04 Nov 2024

https://github.com/jkoutsikakis/pytorch-wrapper

Provides a systematic and extensible way to build, train, evaluate, and tune deep learning models using PyTorch.

data-science deep-learning machine-learning neural-network python pytorch pytorch-wrapper tensor

Last synced: 07 Aug 2024

https://github.com/tidypyverse/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 07 Nov 2024

https://github.com/giswqs/manjaro-linux

Shell scripts for setting up Manjaro Linux for Python programming and deep learning

data-science deep-learning gis kde manjaro manjaro-linux notebook-jupyter python r remote-sensing shell-scripts tensorflow

Last synced: 31 Oct 2024

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 07 Nov 2024

https://github.com/lsys/lexicalrichness

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

data-mining data-science information-retrieval lexical-analysis lexical-analyzer linguistic-analysis natural-language natural-language-processing nlp python

Last synced: 02 Nov 2024

https://github.com/zetane/zetaforge

Open source AI platform for rapid development of advanced AI and AGI pipelines.

agi ai claude data-science developer-tools gpt kubernetes llm machine-learning ml ml-pipelines mlops python workflow workflow-orchestration zetaforge

Last synced: 10 Nov 2024

https://github.com/eclipse-zenoh-flow/zenoh-flow

zenoh-flow aims at providing a zenoh-based data-flow programming framework for computations that span from the cloud to the device.

autonomous-vehicles data-science dataflow-programming machine-learning robotics ros2 rust-lang

Last synced: 31 Oct 2024

https://github.com/firmai/business-analytics-and-mathematics-python-book

Advanced Business Analytics and Mathematics with Python (by @firmai)

analytics business data-analysis data-science mathematics python

Last synced: 04 Aug 2024

https://github.com/talegari/tidypandas

A grammar of data manipulation for pandas inspired by tidyverse

data-analysis data-science dataframe dataframe-library dplyr pandas python tidyverse

Last synced: 12 Aug 2024

https://github.com/delsner/flask-angular-data-science

Repository for a data science starter app using Flask, Angular and Docker. https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280

angular data-science docker flask machine-learning python sklearn typescript

Last synced: 10 Oct 2024

https://github.com/synthesized-io/fairlens

Identify bias and measure fairness of your data

bias data data-analysis data-science fairness pandas python statistics

Last synced: 03 Aug 2024

https://github.com/scrapinghub/aile

Automatic Item List Extraction

data-science

Last synced: 10 Nov 2024

https://github.com/tiledb-inc/tiledb-vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

bioinformatics data-science genomics gwas python spark tiledb variant-calling vcf

Last synced: 07 Nov 2024

https://github.com/akgold/do4ds

A book on DevOps for Data Scientists with CRC Press.

data-science devops it python r

Last synced: 10 Nov 2024

https://github.com/jeroenjanssens/python-polars-the-definitive-guide

Scripts and datasets for the O'Reilly book Python Polars: The Definitive Guide

data-science oreilly oreilly-books polars polars-dataframe python

Last synced: 27 Oct 2024

https://github.com/sangaline/reverse-engineering-the-hacker-news-ranking-algorithm

An analysis of historical Hacker News data to determine the ranking algorithm

analysis data-science hacker-news

Last synced: 07 Nov 2024

https://github.com/uc-r/uc-r.github.io

Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.

classroom data-science data-wrangling machine-learning r tutorial tutorial-code visualization

Last synced: 30 Oct 2024

https://github.com/rogerfitz/tutorials

Git Repo for Articles on Ergo Sum blog and the youtube channel https://www.youtube.com/channel/UCiie9CN--dazA7iT2sry5FA

algorithmia data-science draft-kings fan-duel fivethirtyeight google-maps-api ocr python sports tech text-to-speech visualizations

Last synced: 08 Nov 2024

https://github.com/svilupp/promptingtools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 30 Oct 2024

https://github.com/palashio/nylon

An intelligent, flexible grammar of machine learning.

auto-ml data-science grammar machine-learning

Last synced: 07 Nov 2024

https://github.com/svilupp/PromptingTools.jl

Streamline your life using PromptingTools.jl, the Julia package that simplifies interacting with large language models.

data-science generative-ai julia

Last synced: 28 Oct 2024

https://github.com/nuclio/nuclio-jupyter

Nuclio Function Automation for Python and Jupyter

data-science jupyter kubernetes nuclio python

Last synced: 04 Nov 2024

https://github.com/seandavi/geoquery

The bridge between the NCBI Gene Expression Omnibus and Bioconductor

bioconductor bioinformatics data-science genomics ncbi-geo r rstats

Last synced: 05 Nov 2024

https://github.com/Dumbris/trunklucator

Python module for data scientists for quick creating annotation projects.

active-learning annotation annotation-tool data-science machine-learning nlp

Last synced: 04 Nov 2024

https://github.com/bcgov/bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue

bcdc citz data-science env r r-package rstats

Last synced: 13 Aug 2024

https://github.com/dspinellis/alexandria3k

Local relational access to openly-available publication data sets

bibliometric-analysis crossref data-science orcid scientometrics

Last synced: 01 Nov 2024

https://github.com/XpressAI/xircuits

Simple visual programming environment for jupyterlab

data-science jupyterlab python

Last synced: 10 Oct 2024

https://github.com/hemansnation/data-analyst-roadmap

Data-Analyst-Roadmap for Professionals. This roadmap contains 8 Chapters that can be completed in 8 weeks, whether you are a fresher in the field or an experienced professional who wants to transition into Data Analysis.

analytics data-analysis data-analysis-python data-analytics data-science numpy predictive-analytics project-based-learning python statistics tableau

Last synced: 08 Nov 2024

https://github.com/ropensci/gittargets

Data version control for reproducible analysis pipelines in R with {targets}.

data-science data-version-control data-versioning r r-package reproducibility reproducible-research rstats targets workflow

Last synced: 31 Oct 2024

https://github.com/andrea-ballatore/open-geo-data-education

Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in this repository have been selected, cleaned, harmonised, and repackaged for GIS exercises in a higher-education context. This is a pretty time-intensive process that other educators can hopefully avoid by using these versions.

data-science geojson geospatial-data geospatial-datasets gis gis-data gis-education tsv

Last synced: 27 Oct 2024

https://github.com/great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

actions continuous-integration data-integrity data-quality data-science mlops

Last synced: 06 Nov 2024

https://github.com/imdeepmind/neuralpy

NeuralPy: A Keras like deep learning library works on top of PyTorch

data-science deep-learning keras library machine-learning neural-network neuralpy neuralpy-torch python pytorch

Last synced: 11 Oct 2024

https://github.com/FlyRanch/figurefirst

A layout-first approach to figure making

data-science inkscape inkscape-extensions matplotlib plotting python svg

Last synced: 03 Aug 2024

https://github.com/OpenSTEF/openstef

Automated Machine Learning pipelines. Builds the Open Short Term Energy Forecasting package.

data-science energy energy-forecasting forecasting machine-learning python time-series

Last synced: 03 Aug 2024

https://github.com/datakitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 12 Oct 2024

https://github.com/woz-u/DS-Student-Resources

Data Science Student Companion Notebooks and Data Lake

data-analysis data-science data-visualization machine-learning nosql python r sql statistics

Last synced: 08 Aug 2024

https://github.com/5agado/conversation-analyzer

Analyzer and statistics generator for text-based conversations. Includes Facebook scraper and parser

data-science facebook quantified-self scraper

Last synced: 08 Nov 2024

https://github.com/zjuearthdata/geochemistrypi

an open-sourced highly automated machine learning Python framework for data-driven geochemistry discovery

dash data-science fastapi flaml geochemistry mlflow nodejs ray reactjs scikit-learn typer

Last synced: 10 Oct 2024

https://github.com/Invictify/Jupter-Notebook-REST-API

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

data-science data-science-pipelines docker dockerfile fastapi jupyter python rest-api

Last synced: 26 Oct 2024

https://github.com/dominodatalab/domino-research

Projects developed by Domino's R&D team

data-science mlflow mlops python sagemaker

Last synced: 13 Aug 2024