Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/build-on-aws/cloud-clubs-learner-library

A library for learners! Whether or not you're a part of AWS Cloud Clubs, take a look in this library for free, open, leveled content for students 18+ worldwide

ai aws containers data-analytics data-science databases iot kubernetes ml mobile-development security serverless web web-development

Last synced: 30 Oct 2024

https://github.com/ideonate/cdsdashboards

JupyterHub extension for ContainDS Dashboards

bokeh data-science jupyter jupyterhub panel plotly-dash rshiny streamlit visualization

Last synced: 13 Oct 2024

https://github.com/agilescientific/striplog

Lithology and stratigraphic logs for wells or outcrop.

data-mining data-science geology petrophysics sedimentology swung-stack

Last synced: 25 Oct 2024

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 11 Oct 2024

https://github.com/analysiscenter/batchflow

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

data-science machine-learning pipeline pipeline-framework python python3 workflow workflow-engine

Last synced: 02 Nov 2024

https://github.com/paddymul/buckaroo

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.

buckaroo data-science jupyter paddy pandas

Last synced: 29 Oct 2024

https://github.com/PecanProject/pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants r

Last synced: 03 Aug 2024

https://github.com/aws/amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.

amazon-redshift aws-redshift data-analysis data-science

Last synced: 07 Oct 2024

https://github.com/rhenanbartels/hrv

A Python package for heart rate variability analysis

data-science hacktoberfest hrv python signal-processing

Last synced: 13 Nov 2024

https://github.com/slowkow/harmonypy

🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.

bioinformatics data-integration data-science single-cell-analysis

Last synced: 15 Oct 2024

https://github.com/coqui-ai/trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 01 Nov 2024

https://github.com/ActivitySim/activitysim

An Open Platform for Activity-Based Travel Modeling

activitysim bsd-3-clause data-science microsimulation python travel-modeling

Last synced: 27 Oct 2024

https://github.com/Toloka/crowd-kit

Control the quality of your labeled data with the Python tools you already know.

aggregations annotation crowd crowdsourcing data-mining data-science labeling python quality-control toloka truth-inference

Last synced: 30 Oct 2024

https://github.com/sktime/skpro

A unified framework for tabular probabilistic regression and probability distributions in python

ai data-science framework machine-learning prediction probabilistic-models probability-distributions python regression sklearn

Last synced: 10 Oct 2024

https://github.com/launchflow/buildflow

BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.

batch data-science pipeline python streaming

Last synced: 06 Aug 2024

https://github.com/trainingbypackt/data-science-for-marketing-analytics

Achieve your marketing goals with the data analytics power of Python

data-science data-visualization matplotlib numpy pandas python seaborn

Last synced: 14 Nov 2024

https://github.com/microsoft/finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

business data-science feature-selection finance finnts forecasting machine-learning microsoft r r-package rstats time-series

Last synced: 13 Nov 2024

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

cuda data-science data-visualization gpgpu gpu nodejs

Last synced: 13 Nov 2024

https://github.com/multimeric/pandasschema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 28 Oct 2024

https://github.com/seg/2016-ml-contest

Machine learning contest - October 2016 TLE

contest data-science fun geophysics geoscience machine-learning

Last synced: 07 Aug 2024

https://github.com/TMiguelT/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 07 Aug 2024

https://github.com/multimeric/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 10 Nov 2024

https://github.com/robmarkcole/HASS-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 05 Nov 2024

https://github.com/robmarkcole/hass-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 13 Nov 2024

https://github.com/drakearch/kaggle-courses

Kaggle courses and tutorials to get you started in the Data Science world.

data-science deep-learning machine-learning pandas python

Last synced: 08 Nov 2024

https://github.com/swoop-inc/spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

data-engineering data-science scala spark

Last synced: 12 Oct 2024

https://github.com/stocknear/backend

Backend of stocknear - Open Source Stock Analysis

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 13 Nov 2024

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 28 Oct 2024

https://github.com/nshiab/simple-data-analysis.js

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 12 Aug 2024

https://github.com/cyb3r-monk/rita-j

Implementation of RITA (Real Intelligence Threat Analytics) in Jupyter Notebook with improved scoring algorithm.

cybersecurity data-science dfir jupyter-notebook threat-hunting

Last synced: 08 Nov 2024

https://github.com/coqui-ai/Trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 07 Aug 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 26 Oct 2024

https://github.com/azure/datasciencevm

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 07 Oct 2024

https://github.com/Azure/DataScienceVM

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 08 Aug 2024

https://github.com/capeprivacy/cape-python

Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.

collaboration data-science hacktoberfest machine-learning pandas policy privacy python spark

Last synced: 03 Aug 2024

https://github.com/oracle-samples/oci-data-science-ai-samples

This repo contains a series of tutorials and code examples highlighting different features of the OCI Data Science and AI services, along with a release vehicle for experimental programs.

ai conda data-science data-science-notebooks deep-learning jupyter-notebook machine-learning oci oracle-cloud-infrastructure python

Last synced: 13 Nov 2024

https://github.com/Oxen-AI/Oxen

Oxen.ai's core rust library, server, and CLI

artificial-intelligence data-science database machine-learning version-control

Last synced: 17 Aug 2024

https://github.com/fedora-infra/fedmsg

Federated Messaging with ZeroMQ

data-science fedora-project message-bus python zeromq

Last synced: 20 Aug 2024

https://github.com/kdr-aus/ogma

Scripting language focused on processing tabular data.

data-science language rust scripting-language table-data

Last synced: 30 Oct 2024

https://github.com/dlab-berkeley/Python-Fundamentals-Legacy

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

data-science introduction-to-python jupyter python

Last synced: 11 Nov 2024

https://github.com/tirthajyoti/ds-with-pysimplegui

Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package

analytics application artificial-intelligence data-science desktop-app gui machine-learning python windows

Last synced: 01 Nov 2024

https://github.com/pydatablog/python-for-data-science

A blog for data analytics using data science technologies

data-science finance python

Last synced: 01 Nov 2024

https://github.com/hugohadfield/kalmangrad

Automated, smooth, N'th order derivatives of non-uniformly sampled time series data

data-science derivatives kalman-filter signal-processing smoothing

Last synced: 23 Oct 2024

https://github.com/Automunge/AutoMunge

Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbations.

data-science machine-learning

Last synced: 27 Oct 2024

https://github.com/apachecn/ds-ai-tech-notes

:book: [译] 数据科学和人工智能技术笔记

ai data-science matplotlib notes numpy python sklearn

Last synced: 10 Oct 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 29 Sep 2024

https://github.com/ahammadmejbah/machine-learning-book-collections

Machine learning is the study and development of data-driven strategies to enhance task performance. AI includes it.

data-science deep-learning machine-learning

Last synced: 11 Nov 2024

https://github.com/robb/rbbjson

Flexible JSON traversal for rapid prototyping.

data-science json jsonpath prototyping swift

Last synced: 27 Oct 2024

https://github.com/lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-spark data-science databricks scala

Last synced: 12 Oct 2024

https://github.com/unnati-xyz/scalable-data-science-platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

data-engineer data-pipeline data-science luigi machine-learning rest-api spark

Last synced: 07 Aug 2024

https://github.com/davendw49/k2

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

ai4science data-science geoai geoscience kg large-language-models llm

Last synced: 02 Nov 2024

https://github.com/solegalli/machine-learning-imbalanced-data

Code repository for the online course Machine Learning with Imbalanced Data

data-science imbalanced-classification imbalanced-data imbalanced-learning machine-learning python

Last synced: 13 Nov 2024

https://github.com/hamelsmu/docker_tutorial

Code and helper scripts for article on Medium "How Docker Can Help You Become A More Effective Data Scientist"

data-science docker docker-tutorial medium medium-article

Last synced: 27 Oct 2024

https://github.com/phillipdupuis/dtale-desktop

Build a data visualization dashboard with simple snippets of python code

data-analysis data-science data-visualization fastapi pandas python react typescript visualization

Last synced: 26 Oct 2024

https://github.com/curiousily/machine-learning-from-scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 11 Nov 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 27 Oct 2024

https://github.com/risenw/datasist

A Python library for easy data analysis, visualization, exploration and modeling

data-analysis data-science data-visualization feature-engineering machine-learning python-3

Last synced: 14 Nov 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 08 Aug 2024

https://github.com/pyscaffold/pyscaffoldext-dsproject

💫 PyScaffold extension for data-science projects

data-science pyscaffold pyscaffold-extension python

Last synced: 08 Nov 2024

https://github.com/thebabylonai/babylog

A lightweight logger for machine learning teams to log images and predictions in production.

computer-vision cvops data-science logger logging-library machine-learning ml mlops python python3

Last synced: 31 Oct 2024

https://github.com/heidelbergcement/hcrystalball

A library that unifies the API for most commonly used libraries and modeling techniques for time-series forecasting in the Python ecosystem.

cross-validation data-science fbprophet model-selection pmdarima sarimax sklearn sklearn-api sklearn-compatible sklearn-library sktime statsmodels tbats time-series time-series-forecasting transformer wrapper

Last synced: 10 Oct 2024