Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/scikit-learn/scikit-learn

scikit-learn: machine learning in Python

data-analysis data-science machine-learning python statistics

Last synced: 30 Jul 2024

https://github.com/pydata/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

alignment data-analysis data-science flexible pandas python

Last synced: 09 Aug 2024

https://github.com/pandas-dev/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

alignment data-analysis data-science flexible pandas python

Last synced: 31 Jul 2024

https://github.com/ray-project/ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

automl data-science deep-learning deployment distributed hyperparameter-optimization hyperparameter-search java llm-serving machine-learning model-selection optimization parallel python pytorch ray reinforcement-learning rllib serving tensorflow

Last synced: 30 Jul 2024

https://github.com/gradio-app/gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

data-analysis data-science data-visualization deep-learning deploy gradio gradio-interface hacktoberfest interface machine-learning models python python-notebook ui ui-components

Last synced: 30 Jul 2024

https://github.com/lightning-AI/lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

ai artificial-intelligence data-science deep-learning machine-learning python pytorch

Last synced: 27 Aug 2024

https://github.com/Lightning-AI/pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

ai artificial-intelligence data-science deep-learning machine-learning python pytorch

Last synced: 31 Jul 2024

https://github.com/Lightning-AI/lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

ai artificial-intelligence data-science deep-learning machine-learning python pytorch

Last synced: 06 Aug 2024

https://github.com/PyTorchLightning/pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

ai artificial-intelligence data-science deep-learning machine-learning python pytorch

Last synced: 08 Aug 2024

https://github.com/camdavidsonpilon/probabilistic-programming-and-bayesian-methods-for-hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

bayesian-methods data-science jupyter-notebook mathematical-analysis pymc statistics

Last synced: 03 Aug 2024

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

bayesian-methods data-science jupyter-notebook mathematical-analysis pymc statistics

Last synced: 30 Jul 2024

https://github.com/microsoft/Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

data-analysis data-science data-visualization pandas python

Last synced: 31 Jul 2024

https://github.com/donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

aws big-data caffe data-science deep-learning hadoop kaggle keras machine-learning mapreduce matplotlib numpy pandas python scikit-learn scipy spark tensorflow theano

Last synced: 30 Jul 2024

https://github.com/okulbilisim/awesome-datascience

:memo: An awesome Data Science repository to learn and apply for real world problems.

analytics awesome-list data-mining data-science data-scientists data-visualization deep-learning hacktoberfest machine-learning science

Last synced: 03 Aug 2024

https://github.com/eriklindernoren/ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

data-mining data-science deep-learning deep-reinforcement-learning genetic-algorithm machine-learning machine-learning-from-scratch

Last synced: 30 Jul 2024

https://github.com/d2l-ai/d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

book computer-vision data-science deep-learning gaussian-processes hyperparameter-optimization jax kaggle keras machine-learning mxnet natural-language-processing notebook python pytorch recommender-system reinforcement-learning tensorflow

Last synced: 31 Jul 2024

https://github.com/fastai/fastbook

The fastai book, published as Jupyter Notebooks

book data-science deep-learning fastai machine-learning notebooks python

Last synced: 30 Jul 2024

https://github.com/Luxurioust/excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

analytics chart data-science ecma-376 excel excelize formula go golang microsoft office ooxml openxml spreadsheet statistics table vba visualization xlsx xml

Last synced: 05 Aug 2024

https://github.com/360EntSecGroup-Skylar/excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

analytics chart data-science ecma-376 excel excelize formula go golang microsoft office ooxml openxml spreadsheet statistics table vba visualization xlsx xml

Last synced: 10 Aug 2024

https://github.com/xuri/excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

analytics chart data-science ecma-376 excel excelize formula go golang microsoft office ooxml openxml spreadsheet statistics table vba visualization xlsx xml

Last synced: 05 Aug 2024

https://github.com/qax-os/excelize

Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

analytics chart data-science ecma-376 excel excelize formula go golang microsoft office ooxml openxml spreadsheet statistics table vba visualization xlsx xml

Last synced: 30 Jul 2024

https://github.com/ipython/ipython

Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

closember data-science hacktoberfest ipython jupyter notebook python repl spec-0

Last synced: 31 Jul 2024

https://github.com/PrefectHQ/prefect

Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines

automation data data-engineering data-ops data-science infrastructure ml-ops observability orchestration pipeline prefect python workflow workflow-engine

Last synced: 31 Jul 2024

https://github.com/dair-ai/ML-YouTube-Courses

📺 Discover the latest machine learning / AI courses on YouTube.

ai data-science deep-learning machine-learning natural-language-processing nlp

Last synced: 30 Jul 2024

https://github.com/mwaskom/seaborn

Statistical data visualization in Python

data-science data-visualization matplotlib pandas python

Last synced: 30 Jul 2024

https://github.com/rasbt/python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource

data-mining data-science logistic-regression machine-learning machine-learning-algorithms neural-network python scikit-learn

Last synced: 30 Jul 2024

https://github.com/allenai/allennlp

An open-source NLP research library, built on PyTorch.

data-science deep-learning natural-language-processing nlp python pytorch

Last synced: 30 Jul 2024

https://github.com/sinaptik-ai/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 02 Aug 2024

https://github.com/Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 31 Jul 2024

https://github.com/gventuri/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 03 Aug 2024

https://github.com/OpenRefine/OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it

data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata

Last synced: 31 Jul 2024

https://github.com/trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

analytics big-data data-science database databases datalake delta-lake distributed-database distributed-systems hadoop hive iceberg java jdbc presto prestodb query-engine sql trino

Last synced: 31 Jul 2024

https://github.com/fastai/numerical-linear-algebra

Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course

algorithms data-science deep-learning linear-algebra machine-learning numpy python

Last synced: 31 Jul 2024

https://github.com/aws/amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

aws data-science deep-learning examples inference jupyter-notebook machine-learning mlops reinforcement-learning sagemaker training

Last synced: 01 Aug 2024

https://github.com/modin-project/modin

Modin: Scale your Pandas workflows by changing a single line of code

analytics data-science dataframe datascience distributed modin pandas python sql

Last synced: 31 Jul 2024

https://github.com/tflearn/tflearn

Deep learning library featuring a higher-level API for TensorFlow.

data-science deep-learning machine-learning neural-network tensorflow tflearn

Last synced: 31 Jul 2024

https://github.com/qiniu/qlang

The Go+ programming language is designed for engineering, STEM education, and data science

data-science engineering golang gop goplus low-code programming-language scientific-computing stem stem-education

Last synced: 03 Aug 2024

https://github.com/goplus/gop

The Go+ programming language is designed for engineering, STEM education, and data science

data-science engineering golang gop goplus low-code programming-language scientific-computing stem stem-education

Last synced: 30 Jul 2024

https://github.com/dair-ai/ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

ai data-science deeplearning machine-learning nlp

Last synced: 31 Jul 2024

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 31 Jul 2024

https://github.com/jindaxiang/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 03 Aug 2024

https://github.com/vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization

Last synced: 30 Jul 2024

https://github.com/blue-yonder/tsfresh

Automatic extraction of relevant features from time series:

data-science feature-extraction time-series

Last synced: 30 Jul 2024

https://github.com/jackzhenguo/python-small-examples

告别枯燥,致力于打造 Python 实用小例子,更多Python良心教程见 https://ai-jupyter.com

data-science machine-learning python python-gui python-web pytorch tensorflow

Last synced: 31 Jul 2024

https://github.com/HugoBlox/hugo-blox-builder

😍 EASILY BUILD THE WEBSITE YOU WANT - NO CODE, JUST MARKDOWN BLOCKS! 使用块轻松创建任何类型的网站 - 无需代码。 一个应用程序,没有依赖项,没有 JS

academic blog blog-engine cms data-science documentation-tool github-pages hugo hugo-theme jupyter netlify open-science page-builder portfolio r rmarkdown rstudio static-site-generator theme website-builder

Last synced: 30 Jul 2024

https://github.com/catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

big-data catboost categorical-features coreml cuda data-mining data-science decision-trees gbdt gbm gpu gpu-computing gradient-boosting kaggle machine-learning python r tutorial

Last synced: 30 Jul 2024

https://github.com/activeloopai/Hub

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 10 Aug 2024