Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/dmbee/seglearn

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 30 Jul 2024

https://dmbee.github.io/seglearn/

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 01 Aug 2024

https://github.com/siznax/wptools

Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis

api-client commons data-science glam linked-open-data mediawiki mediawiki-api open-data python restbase wikidata wikimedia-commons wikipedia wikipedia-api

Last synced: 01 Aug 2024

https://github.com/GRAAL-Research/poutyne

A simplified framework and utilities for PyTorch

data-science deep-learning keras machine-learning neural-network python pytorch

Last synced: 31 Jul 2024

https://github.com/LearnDataSci/articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

data-analysis data-science data-visualization machine-learning machine-learning-algorithms machinelearning python

Last synced: 01 Aug 2024

https://github.com/starpig1129/ai-data-analysis-mulitagent

AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 17 Sep 2024

https://github.com/rushter/heamy

A set of useful tools for competitive data science.

data-science machine-learning stacking

Last synced: 03 Aug 2024

https://github.com/Kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 01 Aug 2024

https://github.com/firmai/pandapy

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

algorithmic-trading arrays data-science data-structures finance machine-learning numpy pandas structured-data

Last synced: 01 Aug 2024

https://github.com/Lackoftactics/facebook_data_analyzer

Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more

conversation data-science data-visualization english-language facebook facebook-data facebook-data-analyzer ruby ruby-gem scraping script statistics

Last synced: 04 Aug 2024

https://github.com/pymc-labs/pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.

btyd buy-till-you-die clv customer-lifetime-value data-science marketing media-mix-modeling mmm python

Last synced: 31 Jul 2024

https://github.com/bradleyboehmke/data-science-learning-resources

A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)

data-science machine-learning

Last synced: 02 Aug 2024

https://hdi-project.github.io/ATM/

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 03 Aug 2024

https://github.com/HDI-Project/ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 06 Aug 2024

https://github.com/justmarkham/pycon-2019-tutorial

Data Science Best Practices with pandas

data-science pandas python tutorial vizualisation

Last synced: 31 Jul 2024

https://github.com/aqueducthq/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 17 Aug 2024

https://github.com/RunLLM/aqueduct

Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.

ai data data-science kubernetes llm llms machine-learning ml ml-infrastructure ml-monitoring mlops orchestration python python3

Last synced: 01 Aug 2024

https://github.com/rpy2/rpy2

Interface to use R from Python

cffi data-science interoperability python r statistics

Last synced: 13 Aug 2024

https://github.com/HoloClean/holoclean

A Machine Learning System for Data Enrichment.

data-enrichment data-science inference-engine machine-learning pytorch

Last synced: 02 Aug 2024

https://github.com/youssefHosni/Efficient-Python-for-Data-Scientists

Writing clean and optimized Python code

data-science numpy pandas python

Last synced: 31 Jul 2024

https://github.com/ericlagergren/decimal

A high-performance, arbitrary-precision, floating-point decimal library.

arbitrary-precision big-decimal data-science decimal dogs-of-instagram financial general-decimal-arithmetic money multi-precision

Last synced: 04 Aug 2024

https://github.com/microsoft/Reactors

🌱 Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.

ai azure cloud data data-science devops dotnet events iot live-streaming low-code meetup mixed-reality ml no-code nodejs personal-de python web

Last synced: 02 Aug 2024

https://github.com/vi3k6i5/GuidedLDA

semi supervised guided topic model with custom guidedLDA

data-science guided-topic-modeling guidedlda machine-learning seededlda topic-modeling

Last synced: 02 Aug 2024

https://github.com/openhackathons-org/gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai

Last synced: 31 Jul 2024

https://github.com/jmschrei/apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

data-science machine-learning python submodular-optimization submodularity

Last synced: 01 Aug 2024

https://github.com/akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python

Last synced: 03 Aug 2024

https://github.com/ottogroup/palladium

Framework for setting up predictive analytics services

data-science machine-learning scikit-learn

Last synced: 31 Jul 2024

https://github.com/JuliaAcademy/DataScience

Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar

data-science julia juliaacademy learnjulia

Last synced: 31 Jul 2024

https://github.com/janpfeifer/gonb

GoNB, a Go Notebook Kernel for Jupyter

data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel

Last synced: 01 Aug 2024

https://github.com/rudeboybert/fivethirtyeight

R package of data and code behind the stories and interactives at FiveThirtyEight

cran data-science datajournalism fivethirtyeight r rpackage statistics

Last synced: 07 Aug 2024

https://github.com/FilippoBovo/production-data-science

Production Data Science: a workflow for collaborative data science aimed at production

collaborative data-science production workflow

Last synced: 02 Aug 2024

https://github.com/serengil/chefboost

A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4.5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting, Random Forest and Adaboost w/categorical features support for Python

adaboost c45-trees cart categorical-features data-mining data-science decision-trees gbdt gbm gbrt gradient-boosting gradient-boosting-machine gradient-boosting-machines id3 kaggle machine-learning python random-forest regression-tree

Last synced: 02 Aug 2024

https://github.com/ploomber/sklearn-evaluation

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

data-science deep-learning jupyter-notebook machine-learning pytorch scikit-learn sklearn tensorflow

Last synced: 01 Aug 2024

https://github.com/akfamily/aktools

AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!

akshare asyncio data data-science fastapi openapi pydanti

Last synced: 31 Jul 2024

https://github.com/pykale/pykale

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!

computer-vision data-science deep-learning domain-adaptation graph-analysis knowledge-aware-learning machine-learning medical-image-analysis meta-learning multimodal multimodal-learning python pytorch transfer-learning

Last synced: 01 Aug 2024

https://github.com/tlkh/ai-lab

All-in-one AI container for rapid prototyping

cuda data-science deep-learning docker jupyter nvidia pytorch tensorflow

Last synced: 02 Aug 2024

https://github.com/kevintpeng/Learn-Something-Every-Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

algorithm aws blog computer-science course-materials data-engineering data-science education educational engineering learning math mathematics research software-engineering university unix waterloo

Last synced: 31 Jul 2024

https://github.com/plotly/dash-table

OBSOLETE: now part of https://github.com/plotly/dash

dash data-science data-visualization plotly plotly-dash python react table

Last synced: 01 Aug 2024

https://github.com/breck7/scroll

Scroll is a language for scientists of all ages. Scroll includes a command line app that builds static blogs, websites, CSVs, text files, and more.

blog cms csv data-science knowledge-base knowledge-graph markdown markup markup-language note-taking scroll static-site-generator tree-notation

Last synced: 01 Aug 2024

https://github.com/pgalko/BambooAI

A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.

ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database

Last synced: 31 Jul 2024

https://github.com/girder/girder

A data management platform for the web, developed by Kitware

data-analytics data-management data-science javascript kitware python resonant

Last synced: 01 Aug 2024

https://github.com/jobream/List-of-Learning-Resources

This collection provides a list of educational resources for Software Engineers. Feel free to add your favorite resources as well and help others in their journey of learning.

competitive-programming computer-science data-science resources software-engineering web-development

Last synced: 02 Aug 2024

https://github.com/firmai/pandasvault

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

data-science data-structures dataframe functions pandas python snippets table tips

Last synced: 03 Aug 2024

https://github.com/okfn-brasil/rosie

🤖 Python application responsible for Serenata de Amor's intelligence

artificial-intelligence data-science machine-learning

Last synced: 31 Jul 2024

https://github.com/ClimbsRocks/machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

auto-ml automated-machine-learning automl data-science data-scientists javascript javascript-library kaggle machine-learning machine-learning-algorithms machine-learning-library ml numerai scikit-learn

Last synced: 07 Aug 2024

https://github.com/rebecca-vickery/data-science-learning-resources

A comprehensive list of free resources for learning data science

artificial-intelligence data data-science machine-learning python

Last synced: 02 Aug 2024

https://github.com/DataScienceUB/introduction-datascience-python-book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

analytics data data-science datascience machine-learning python sentiment-analysis

Last synced: 07 Aug 2024

https://github.com/Chicago/food-inspections-evaluation

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.

cdph chicago data-science food-poisoning open-data open-science public-health

Last synced: 31 Jul 2024

https://github.com/dcai-course/dcai-lab

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

course data-centric-ai data-science deep-learning homework lab machine-learning

Last synced: 31 Jul 2024

https://github.com/SforAiDl/genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

algorithm-implementations benchmarking data-science deep-learning gym hacktoberfest machine-learning neural-network openai python pytorch reinforcement-learning reinforcement-learning-algorithms

Last synced: 02 Aug 2024

https://github.com/5agado/data-science-learning

Repository of code and resources related to different data science and machine learning topics. For learning, practice and teaching purposes.

data-science deep-learning jupyter-notebook learning-by-doing machine-learning statistics

Last synced: 01 Aug 2024

https://github.com/platonai/PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql

Last synced: 01 Aug 2024

https://github.com/rio-labs/rio

WebApps in pure Python. No JavaScript, HTML and CSS needed

data-analysis data-science data-visualization deep-learning machine-learning python ui webapp

Last synced: 01 Aug 2024

https://github.com/ledell/useR-machine-learning-tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

data-science deep-learning ensemble-learning gradient-boosting-machine machine-learning r random-forest tutorial

Last synced: 07 Aug 2024

https://github.com/capitalone/datacompy

Pandas and Spark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python spark

Last synced: 04 Aug 2024

https://github.com/airalcorn2/Michael-s-Data-Science-Curriculum

This is the companion curriculum to my guide to becoming a data scientist.

curriculum data-science machine-learning statistics

Last synced: 05 Aug 2024

https://github.com/EpistasisLab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 31 Jul 2024

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 02 Aug 2024

https://github.com/basedosdados/mais

⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 24 Aug 2024

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 04 Aug 2024

https://github.com/tobgu/qframe

Immutable data frame for Go

data-frame data-science dataframe go golang immutable

Last synced: 31 Jul 2024

https://github.com/mfarragher/obsidiantools

Obsidian tools - a Python package for analysing an Obsidian.md vault

data-science knowledge-management network-analysis note-taking obsidian-community obsidian-md python

Last synced: 04 Aug 2024

https://github.com/plotly/dashR

Create data science and AI web apps in R

dash data-science data-visualization plotly plotly-dash python r react web-application

Last synced: 31 Jul 2024

https://github.com/DagsHub/fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

data-science dvc git

Last synced: 03 Aug 2024

https://github.com/finos/jupyterlab_templates

Support for jupyter notebook templates in jupyterlab

data-science dataviz jupyter jupyterlab jupyterlab-extension machine-learning notebook

Last synced: 01 Aug 2024