Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/ActivitySim/activitysim

An Open Platform for Activity-Based Travel Modeling

activitysim bsd-3-clause data-science microsimulation python travel-modeling

Last synced: 27 Oct 2024

https://github.com/Toloka/crowd-kit

Control the quality of your labeled data with the Python tools you already know.

aggregations annotation crowd crowdsourcing data-mining data-science labeling python quality-control toloka truth-inference

Last synced: 30 Oct 2024

https://github.com/sktime/skpro

A unified framework for tabular probabilistic regression and probability distributions in python

ai data-science framework machine-learning prediction probabilistic-models probability-distributions python regression sklearn

Last synced: 10 Oct 2024

https://github.com/launchflow/buildflow

BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.

batch data-science pipeline python streaming

Last synced: 06 Aug 2024

https://github.com/microsoft/finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

business data-science feature-selection finance finnts forecasting machine-learning microsoft r r-package rstats time-series

Last synced: 30 Oct 2024

https://github.com/multimeric/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 10 Nov 2024

https://github.com/seg/2016-ml-contest

Machine learning contest - October 2016 TLE

contest data-science fun geophysics geoscience machine-learning

Last synced: 07 Aug 2024

https://github.com/TMiguelT/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 07 Aug 2024

https://github.com/multimeric/pandasschema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 28 Oct 2024

https://github.com/drakearch/kaggle-courses

Kaggle courses and tutorials to get you started in the Data Science world.

data-science deep-learning machine-learning pandas python

Last synced: 08 Nov 2024

https://github.com/robmarkcole/hass-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 30 Oct 2024

https://github.com/robmarkcole/HASS-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 05 Nov 2024

https://github.com/swoop-inc/spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

data-engineering data-science scala spark

Last synced: 12 Oct 2024

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 28 Oct 2024

https://github.com/nshiab/simple-data-analysis.js

Easy-to-use and high-performance JavaScript library for data analysis.

data data-analysis data-science duckdb javascript nodejs typescript

Last synced: 12 Aug 2024

https://github.com/cyb3r-monk/rita-j

Implementation of RITA (Real Intelligence Threat Analytics) in Jupyter Notebook with improved scoring algorithm.

cybersecurity data-science dfir jupyter-notebook threat-hunting

Last synced: 08 Nov 2024

https://github.com/coqui-ai/Trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 07 Aug 2024

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

cuda data-science data-visualization gpgpu gpu nodejs

Last synced: 30 Oct 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 26 Oct 2024

https://github.com/Azure/DataScienceVM

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 08 Aug 2024

https://github.com/azure/datasciencevm

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 07 Oct 2024

https://github.com/capeprivacy/cape-python

Privacy transformations on Spark and Pandas dataframes backed by a simple policy language.

collaboration data-science hacktoberfest machine-learning pandas policy privacy python spark

Last synced: 03 Aug 2024

https://github.com/oracle-samples/oci-data-science-ai-samples

This repo contains a series of tutorials and code examples highlighting different features of the OCI Data Science and AI services, along with a release vehicle for experimental programs.

ai conda data-science data-science-notebooks deep-learning jupyter-notebook machine-learning oci oracle-cloud-infrastructure python

Last synced: 30 Oct 2024

https://github.com/Oxen-AI/Oxen

Oxen.ai's core rust library, server, and CLI

artificial-intelligence data-science database machine-learning version-control

Last synced: 17 Aug 2024

https://github.com/fedora-infra/fedmsg

Federated Messaging with ZeroMQ

data-science fedora-project message-bus python zeromq

Last synced: 20 Aug 2024

https://github.com/kdr-aus/ogma

Scripting language focused on processing tabular data.

data-science language rust scripting-language table-data

Last synced: 30 Oct 2024

https://github.com/dlab-berkeley/Python-Fundamentals-Legacy

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

data-science introduction-to-python jupyter python

Last synced: 11 Nov 2024

https://github.com/pydatablog/python-for-data-science

A blog for data analytics using data science technologies

data-science finance python

Last synced: 01 Nov 2024

https://github.com/tirthajyoti/ds-with-pysimplegui

Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package

analytics application artificial-intelligence data-science desktop-app gui machine-learning python windows

Last synced: 01 Nov 2024

https://github.com/hugohadfield/kalmangrad

Automated, smooth, N'th order derivatives of non-uniformly sampled time series data

data-science derivatives kalman-filter signal-processing smoothing

Last synced: 23 Oct 2024

https://github.com/Automunge/AutoMunge

Tabular feature encoding pipelines for machine learning with options for string parsing, missing data infill, and stochastic perturbations.

data-science machine-learning

Last synced: 27 Oct 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 29 Sep 2024

https://github.com/robb/rbbjson

Flexible JSON traversal for rapid prototyping.

data-science json jsonpath prototyping swift

Last synced: 27 Oct 2024

https://github.com/apachecn/ds-ai-tech-notes

:book: [译] 数据科学和人工智能技术笔记

ai data-science matplotlib notes numpy python sklearn

Last synced: 10 Oct 2024

https://github.com/lamastex/scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-spark data-science databricks scala

Last synced: 12 Oct 2024

https://github.com/ahammadmejbah/machine-learning-book-collections

Machine learning is the study and development of data-driven strategies to enhance task performance. AI includes it.

data-science deep-learning machine-learning

Last synced: 11 Nov 2024

https://github.com/unnati-xyz/scalable-data-science-platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

data-engineer data-pipeline data-science luigi machine-learning rest-api spark

Last synced: 07 Aug 2024

https://github.com/davendw49/k2

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

ai4science data-science geoai geoscience kg large-language-models llm

Last synced: 02 Nov 2024

https://github.com/solegalli/machine-learning-imbalanced-data

Code repository for the online course Machine Learning with Imbalanced Data

data-science imbalanced-classification imbalanced-data imbalanced-learning machine-learning python

Last synced: 30 Oct 2024

https://github.com/hamelsmu/docker_tutorial

Code and helper scripts for article on Medium "How Docker Can Help You Become A More Effective Data Scientist"

data-science docker docker-tutorial medium medium-article

Last synced: 27 Oct 2024

https://github.com/stocknear/backend

Backend of stocknear - Stock Analysis for Data Freaks ❤️

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 30 Oct 2024

https://github.com/phillipdupuis/dtale-desktop

Build a data visualization dashboard with simple snippets of python code

data-analysis data-science data-visualization fastapi pandas python react typescript visualization

Last synced: 26 Oct 2024

https://github.com/curiousily/machine-learning-from-scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 11 Nov 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 27 Oct 2024

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 08 Aug 2024

https://github.com/pyscaffold/pyscaffoldext-dsproject

💫 PyScaffold extension for data-science projects

data-science pyscaffold pyscaffold-extension python

Last synced: 08 Nov 2024

https://github.com/risenw/datasist

A Python library for easy data analysis, visualization, exploration and modeling

data-analysis data-science data-visualization feature-engineering machine-learning python-3

Last synced: 13 Oct 2024

https://github.com/thebabylonai/babylog

A lightweight logger for machine learning teams to log images and predictions in production.

computer-vision cvops data-science logger logging-library machine-learning ml mlops python python3

Last synced: 31 Oct 2024

https://github.com/heidelbergcement/hcrystalball

A library that unifies the API for most commonly used libraries and modeling techniques for time-series forecasting in the Python ecosystem.

cross-validation data-science fbprophet model-selection pmdarima sarimax sklearn sklearn-api sklearn-compatible sklearn-library sktime statsmodels tbats time-series time-series-forecasting transformer wrapper

Last synced: 10 Oct 2024

https://github.com/oxinabox/datadeps.jl

reproducible data setup for reproducible science

data data-science open-science

Last synced: 18 Oct 2024

https://github.com/minusxai/minusx

MinusX is an AI Data Scientist for Analytics Apps you already use and love. Currently it supports Jupyter, Metabase, & Posthog.

artificial-intelligence data-analytics data-science jupyter metabase

Last synced: 11 Oct 2024

https://github.com/whitews/flowkit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces

cytometry data-science fcs fcs-files flow-cytometry flow-cytometry-analysis flowjo gatingml immunology python

Last synced: 11 Nov 2024

https://github.com/emilhvitfeldt/r-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 30 Oct 2024

https://github.com/mybridge/learn-python

Python Top 45 Articles of 2017

algorithm data-science machine-learning python python3

Last synced: 07 Nov 2024

https://github.com/EmilHvitfeldt/R-text-data

List of textual data sources to be used for text mining in R

data-science nlp rstats text-analysis text-analytics-in-r text-mining tidytext

Last synced: 05 Aug 2024

https://github.com/voila-dashboards/voici

Voici turns any Jupyter Notebook into a static web application

dashboards data-science emscripten jupyter jupyterlite voila-dashboard wasm

Last synced: 04 Sep 2024