Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/target/matrixprofile-ts

A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

data-science matrix-profile motif motif-discovery pip pip3 pypi pypi-packages python python3 time-series timeseries-analysis timeseries-segmentation

Last synced: 29 Oct 2024

https://github.com/ipython-books/cookbook-2nd-code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

computing data-analysis data-mining data-science data-visualization ipython jupyter jupyter-notebook machine-learning numerical-computation python visualization

Last synced: 17 Dec 2024

https://github.com/mrankitgupta/data-analyst-roadmap

I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

ankit ankit-gupta ankitgupta data-analysis data-analytics data-science data-structures data-visualization excel mongodb mysql pandas powerbi python sql sql-server tableau

Last synced: 20 Dec 2024

https://github.com/arvkevi/kneed

Knee point detection in Python :chart_with_upwards_trend:

data-analysis data-science elbow-method knee-point python scientific-computing systems

Last synced: 28 Oct 2024

https://github.com/iterative/mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞

cli data-science deployment developer-tools git machine-learning mlem model-registry python

Last synced: 30 Oct 2024

https://github.com/erikaduan/r_tips

A repository of R usage tips for data cleaning, data mining, data visualisation, statistical inference and machine learning

data-science data-visualization machine-learning r rstats statistics

Last synced: 04 Dec 2024

https://github.com/pdpipe/pdpipe

Easy pipelines for pandas DataFrames.

data data-science dataframe dataframes pandas pandas-dataframe pipeline

Last synced: 08 Nov 2024

https://github.com/janpfeifer/gonb

GoNB, a Go Notebook Kernel for Jupyter

data-science go golang gonb jupyter jupyter-notebook jupyter-notebook-kernel

Last synced: 20 Dec 2024

https://github.com/pymc-labs/pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.

btyd buy-till-you-die clv customer-lifetime-value data-science marketing media-mix-modeling mmm python

Last synced: 30 Oct 2024

https://github.com/nicolaskruchten/jupyter_pivottablejs

Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

data-analysis data-science interactive jupyter-notebook pivot-chart pivot-tables

Last synced: 20 Dec 2024

https://github.com/litaotao/IPython-Dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 07 Dec 2024

https://github.com/litaotao/ipython-dashboard

A stand alone, light-weight web server for building, sharing graphs created in ipython. Build for data science, data analysis guys. Aiming at building an interactive visualization, collaborated dashboard, and real-time streaming graph.

dashboard data-science ipython ipython-dashboard notebook visualization

Last synced: 22 Dec 2024

https://github.com/biomedsciai/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 17 Dec 2024

https://github.com/BiomedSciAI/causallib

A Python package for modular causal inference analysis and model evaluations

causal causal-inference causal-models causality data-science machine-learning ml

Last synced: 30 Oct 2024

https://github.com/trainingbypackt/data-science-projects-with-python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 21 Dec 2024

https://github.com/krish-adi/barfi

Python Flow Based Programming environment that provides a graphical programming environment.

ai-ml data-science dataflow-programming flow-based-programming framework graphical-programming jupyter jupyter-notebook ml python streamlit

Last synced: 19 Dec 2024

https://github.com/ashishpatel26/amazing-feature-engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

data-analysis data-mining data-science data-scientists data-visualization deep-learning feature-engineering feature-extraction feature-scaling feature-selection features machine-learning scikit-learn

Last synced: 20 Dec 2024

https://github.com/odpi/opends4all

OpenDS4All project, hosted by LF AI & Data

data-science jupyter-notebooks materials

Last synced: 09 Nov 2024

https://github.com/pm4py/pm4py-core

Public repository for the PM4Py (Process Mining for Python) project.

data-mining data-science machine-learning process-mining python

Last synced: 10 Nov 2024

https://github.com/faktionai/awesome-ai-usecases

A list of awesome and proven Artificial Intelligence use cases and applications

data-science machine-learning

Last synced: 13 Oct 2024

https://github.com/fastai/fastai2

Temporary home for fastai v2 while it's being developed

data-science deep-learning fastai jupyter machine-learning nbdev python pytorch

Last synced: 27 Nov 2024

https://github.com/TrainingByPackt/Data-Science-Projects-with-Python

A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn

data-science machine-learning numpy pandas pandas-dataframe python scikit-learn

Last synced: 08 Nov 2024

https://github.com/aeturrell/coding-for-economists

This repository hosts the code behind the online book, Coding for Economists.

book data-science econometrics economics economics-models jupyter-notebook learning python research vscode

Last synced: 07 Nov 2024

https://github.com/rstojnic/lazydata

Lazydata: Scalable data dependencies for Python projects

data-science datamanagement machine-learning python

Last synced: 29 Oct 2024

https://github.com/blue-yonder/turbodbc

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with the Python Database API Specification 2.0.

data-science database exasol numpy odbc pep249 pyodbc python python-database-api speedup

Last synced: 19 Dec 2024

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 28 Sep 2024

https://github.com/squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 17 Dec 2024

https://github.com/Squarespace/datasheets

Read data from, write data to, and modify the formatting of Google Sheets

data data-analytics data-science dataframe google pandas python

Last synced: 26 Oct 2024

https://github.com/github/codespaces-jupyter

Explore machine learning and data science with Codespaces

codespaces data-science jupyter-notebook machine-learning

Last synced: 19 Dec 2024

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 06 Nov 2024

https://github.com/erezsh/preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 22 Dec 2024

https://github.com/sforaidl/kd_lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

algorithm-implementations benchmarking data-science deep-learning-library knowledge-distillation machine-learning model-compression pruning pytorch quantization

Last synced: 22 Dec 2024

https://github.com/rnorm/book_sample

another book on data science

book data-science python r

Last synced: 27 Nov 2024

https://github.com/gesistsa/rio

🐟 A Swiss-Army Knife for Data I/O

cran csv csvy data data-science excel io r rio sas spss stata

Last synced: 19 Dec 2024

https://github.com/tuangauss/DataScienceProjects

The code repository for projects and tutorials in R and Python that covers a variety of topics in data visualization, statistics sports analytics and general application of probability theory.

data-science data-visualization statistics

Last synced: 01 Nov 2024

https://github.com/erezsh/Preql

An interpreted relational query language that compiles to SQL.

data-science database python query sql

Last synced: 29 Oct 2024

https://github.com/juliastats/glm.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 19 Dec 2024

https://github.com/jadianes/data-science-your-way

Ways of doing Data Science Engineering and Machine Learning in R and Python

data-frame data-science data-science-engineering exploratory-data-analysis jupyter machine-learning notebook python r tutorial

Last synced: 21 Dec 2024

https://github.com/Kotlin/kandy

Kotlin plotting library.

data-science graphics jupyter-notebooks kotlin plot

Last synced: 07 Nov 2024

https://github.com/JuliaStats/GLM.jl

Generalized linear models in Julia

data-science glm julia regression statistical-models statistics

Last synced: 12 Nov 2024

https://github.com/DiskFrame/disk.frame

Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

data data-science large-dataset manipulation-data medium-data r

Last synced: 25 Oct 2024

https://github.com/alegonz/baikal

A graph-based functional API for building complex scikit-learn pipelines.

data-science graph-based machine-learning python scikit-learn

Last synced: 15 Nov 2024

https://github.com/jacksonwuxs/dapy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 17 Dec 2024

https://github.com/JacksonWuxs/DaPy

Easy-to-use data analysis / manipulation framework for humans

analysis data-analysis data-science efficiency pypi python statistical-reports

Last synced: 31 Oct 2024

https://github.com/kkulma/climate-change-data

:earth_africa: A curated list of APIs, open data and ML/AI projects on climate change

climate climate-analysis climate-change climate-data data data-science datascience hacktoberfest python r resources rstats

Last synced: 18 Dec 2024

https://github.com/siznax/wptools

Wikipedia tools (for Humans): easily extract data from Wikipedia, Wikidata, and other MediaWikis

api-client commons data-science glam linked-open-data mediawiki mediawiki-api open-data python restbase wikidata wikimedia-commons wikipedia wikipedia-api

Last synced: 21 Dec 2024

https://github.com/dmbee/seglearn

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 26 Oct 2024

https://dmbee.github.io/seglearn/

Python module for machine learning time series:

data-science machine-learning python time-series

Last synced: 02 Nov 2024

https://github.com/rpy2/rpy2

Interface to use R from Python

cffi data-science interoperability python r statistics

Last synced: 22 Dec 2024

https://github.com/GRAAL-Research/poutyne

A simplified framework and utilities for PyTorch

data-science deep-learning keras machine-learning neural-network python pytorch

Last synced: 30 Oct 2024

https://github.com/akfamily/aktools

AKTools is an elegant and simple HTTP API library for AKShare, built for AKSharers!

akshare asyncio data data-science fastapi openapi pydanti

Last synced: 19 Dec 2024

https://github.com/LearnDataSci/articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

data-analysis data-science data-visualization machine-learning machine-learning-algorithms machinelearning python

Last synced: 07 Nov 2024

https://github.com/starpig1129/ai-data-analysis-mulitagent

AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 17 Sep 2024

https://github.com/rushter/heamy

A set of useful tools for competitive data science.

data-science machine-learning stacking

Last synced: 22 Dec 2024

https://github.com/firmai/pandapy

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

algorithmic-trading arrays data-science data-structures finance machine-learning numpy pandas structured-data

Last synced: 04 Nov 2024

https://github.com/Lackoftactics/facebook_data_analyzer

Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more

conversation data-science data-visualization english-language facebook facebook-data facebook-data-analyzer ruby ruby-gem scraping script statistics

Last synced: 20 Nov 2024

https://github.com/WecoAI/aideml

AIDE: the state-of-the-art machine learning engineer agent, generating machine learning solution code from natural language descriptions.

ai data-science llm machine-learning

Last synced: 12 Nov 2024

https://github.com/youssefhosni/efficient-python-for-data-scientists

Writing clean and optimized Python code

data-science numpy pandas python

Last synced: 21 Dec 2024

https://github.com/justmarkham/pycon-2019-tutorial

Data Science Best Practices with pandas

data-science pandas python tutorial vizualisation

Last synced: 21 Dec 2024

https://github.com/bradleyboehmke/data-science-learning-resources

A collection of data science and machine learning resources that I've found helpful (I only post what I've read!)

data-science machine-learning

Last synced: 02 Dec 2024

https://github.com/HDI-Project/ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 25 Nov 2024

https://hdi-project.github.io/ATM/

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 18 Nov 2024

https://github.com/youssefHosni/Efficient-Python-for-Data-Scientists

Writing clean and optimized Python code

data-science numpy pandas python

Last synced: 27 Oct 2024

https://github.com/inseefrlab/onyxia

🔬 Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 20 Dec 2024