Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/mgalarnyk/datasciencecoursera

Data Science Repo and blog for John Hopkins Coursera Courses. Please let me know if you have any questions.

data-science jhu-coursera john-hopkins-coursera python r stanford

Last synced: 20 Dec 2024

https://github.com/mGalarnyk/datasciencecoursera

Data Science Repo and blog for John Hopkins Coursera Courses. Please let me know if you have any questions.

data-science jhu-coursera john-hopkins-coursera python r stanford

Last synced: 13 Nov 2024

https://github.com/shashankvemuri/finance

150+ quantitative finance Python programs to help you gather, manipulate, and analyze stock market data

algorithmic-trading data-science finance machine-learning pandas python quantitative-finance stock stock-market stocks technical-indicators trading-strategies

Last synced: 19 Dec 2024

https://github.com/chiphuyen/lazynlp

Library to scrape and clean web pages to create massive datasets.

artificial-intelligence data-science language-model natural-language-processing nlp open python text-mining

Last synced: 21 Dec 2024

https://github.com/justmarkham/pandas-videos

Jupyter notebook and datasets from the pandas video series

data-analysis data-cleaning data-science jupyter-notebook pandas python tutorial

Last synced: 20 Dec 2024

https://github.com/wooey/wooey

A Django app that creates automatic web UIs for Python scripts.

data-science django python python-scripts web wooey workflows

Last synced: 19 Dec 2024

https://github.com/Lightning-AI/torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.

analyses data-science deep-learning machine-learning metrics python pytorch

Last synced: 30 Oct 2024

https://github.com/PyTorchLightning/metrics

Machine learning metrics for distributed, scalable PyTorch applications.

analyses data-science deep-learning machine-learning metrics python pytorch

Last synced: 08 Nov 2024

https://github.com/lightning-ai/torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.

analyses data-science deep-learning machine-learning metrics python pytorch

Last synced: 23 Dec 2024

https://github.com/PytorchLightning/metrics

Machine learning metrics for distributed, scalable PyTorch applications.

analyses data-science deep-learning machine-learning metrics python pytorch

Last synced: 19 Dec 2024

https://github.com/visualize-ml/book6_first-course-in-data-science

Book_6_《数据有道》 | 鸢尾花书:从加减乘除到机器学习;欢迎大家批评指正!纠错多的同学会得到赠书感谢!

data data-science data-visualization feature-engineering machine-learning python

Last synced: 19 Dec 2024

https://github.com/metarank/metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine

automl data-engineering data-science deep-learning feature-engineering feature-extraction kubernetes machine-learning neural-networks personalization ranking scala search

Last synced: 19 Dec 2024

https://github.com/cerlymarco/medium_notebook

Repository containing notebooks of my posts on Medium

artificial-intelligence data-science deep-learning machine-learning notebooks

Last synced: 18 Dec 2024

https://github.com/chdb-io/chdb

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

chdb clickhouse clickhouse-database clickhouse-server data-science database embedded-database olap python sql

Last synced: 24 Dec 2024

https://github.com/sfu-db/dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

apis apiwrapper cleaning connector data-exploration data-science datacleaning dataconnector dataprep datapreparation eda exploratory-data-analysis webconnector

Last synced: 24 Dec 2024

https://github.com/cerlymarco/MEDIUM_NoteBook

Repository containing notebooks of my posts on Medium

artificial-intelligence data-science deep-learning machine-learning notebooks

Last synced: 13 Nov 2024

https://github.com/wooey/Wooey

A Django app that creates automatic web UIs for Python scripts.

data-science django python python-scripts web wooey workflows

Last synced: 24 Oct 2024

https://github.com/alexhallam/tv

📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.

cli column command-line command-line-tool csv csv-cat csv-column csv-pretty-print csv-viewer csv-visualization data-science dataframe datatable pretty-print pretty-printer rust tabular-data terminal tibble

Last synced: 19 Dec 2024

https://github.com/shashankvemuri/Finance

150+ quantitative finance Python programs to help you gather, manipulate, and analyze stock market data

algorithmic-trading data-science finance machine-learning pandas python quantitative-finance stock stock-market stocks technical-indicators trading-strategies

Last synced: 02 Nov 2024

https://github.com/orico/www.mlcompendium.com

The Machine Learning & Deep Learning Compendium was a list of references in my private & single document, which I curated in order to expand my knowledge, it is now an open knowledge-sharing project compiled using Gitbook.

algorithms data-science deep-learning full-stack gitbook machine-learning marketing mlcompendium probability product-management statistics ux-design ux-experience ux-research

Last synced: 04 Dec 2024

https://github.com/lining808/cs-ebook

一个高质量、经典计算机书籍推荐清单,特点为:只收集高质量,各方向经典书籍,不求书多,只求书精。

ai computer-science data-science deep-learning ebooks programming-language

Last synced: 21 Dec 2024

https://github.com/danijar/handout

Turn Python scripts into handouts with Markdown and figures

data-science notebook productivity prototyping python research

Last synced: 20 Dec 2024

https://github.com/ujjwalkarn/datasciencer

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 20 Dec 2024

https://github.com/ujjwalkarn/DataScienceR

a curated list of R tutorials for Data Science, NLP and Machine Learning

data-science datascience r text-mining

Last synced: 16 Nov 2024

https://github.com/ph055a/osint_collection

Maintained collection of OSINT related resources. (All Free & Actionable)

court-search data-science dataset infosec investigation journalism osint research search

Last synced: 03 Dec 2024

https://github.com/Ph055a/OSINT_Collection

Maintained collection of OSINT related resources. (All Free & Actionable)

court-search data-science dataset infosec investigation journalism osint research search

Last synced: 03 Nov 2024

https://github.com/dagworks-inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Last synced: 24 Dec 2024

https://github.com/yuankunzhang/charming

A visualization library for Rust

chart data-science rust visualization webassembly

Last synced: 24 Dec 2024

https://github.com/szilard/benchm-ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

data-science deep-learning gradient-boosting-machine h2o machine-learning python r random-forest spark xgboost

Last synced: 21 Dec 2024

https://github.com/diffgram/diffgram

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation

Last synced: 25 Oct 2024

https://github.com/DAGWorks-Inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Last synced: 29 Oct 2024

https://github.com/featureform/featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Last synced: 18 Dec 2024

https://github.com/the-turing-way/the-turing-way

Host repository for The Turing Way: a how to guide for reproducible data science

closember community data-science education hacktoberfest hut23 hut23-270 hut23-396

Last synced: 29 Nov 2024

https://github.com/azure/azureml-examples

Official community-driven Azure Machine Learning examples, tested with GitHub Actions.

azure azure-machine-learning azureml data-science ml

Last synced: 23 Dec 2024

https://github.com/supabase/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.

auth authentication authorization community data-science databases django fastapi flask good-first-issue machine-learning postgres postgresql python supabase

Last synced: 24 Dec 2024

https://github.com/scverse/scanpy

Single-cell analysis in Python. Scales to >1M cells.

anndata bioinformatics data-science machine-learning python scanpy scverse transcriptomics visualize-data

Last synced: 24 Dec 2024

https://github.com/Azure/azureml-examples

Official community-driven Azure Machine Learning examples, tested with GitHub Actions.

azure azure-machine-learning azureml data-science ml

Last synced: 27 Nov 2024

https://github.com/variety/variety

Variety: a MongoDB Schema Analyzer

data-science javascript mongo mongodb nosql nosql-analytics

Last synced: 19 Dec 2024

https://github.com/neonwatty/machine-learning-refined

Master the fundamentals of machine learning, deep learning, and mathematical optimization by building key concepts and models from scratch using Python.

artificial-intelligence autograd collab data-science deep-learning docker genai jax jupyter-notebook lecture-notes machine-learning machine-learning-algorithms mathematical-optimization neural-network numpy python

Last synced: 18 Dec 2024

https://github.com/neonwatty/machine_learning_refined

Notes, Python demos / notebooks, and free chapters for the 2nd edition of the university textbook "Machine Learning Refined".

artificial-intelligence autograd collab data-science deep-learning genai jax jupyter-notebook lecture-notes machine-learning machine-learning-algorithms mathematical-optimization neural-network numpy python slides

Last synced: 12 Dec 2024

https://github.com/apachecn/python_data_analysis_and_mining_action

《python数据分析与挖掘实战》的代码笔记

data-analysis data-science python3 readingnotes

Last synced: 21 Dec 2024

https://github.com/alexioannides/pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

data-engineering data-science etl etl-job etl-pipeline pyspark python spark

Last synced: 21 Dec 2024

https://github.com/jermwatt/machine_learning_refined

Notes, examples, and Python demos for the 2nd edition of the textbook "Machine Learning Refined" (published by Cambridge University Press).

artificial-intelligence autograd collab data-science deep-learning jax jupyter-notebook lecture-notes machine-learning machine-learning-algorithms mathematical-optimization neural-network numpy python slides

Last synced: 08 Nov 2024

https://github.com/tidyverse/tidyverse

Easily install and load packages from the tidyverse

data-science r tidyverse

Last synced: 23 Dec 2024

https://github.com/iphysresearch/DataSciComp

A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.

challenge competition data-challenge data-science data-science-competitions project

Last synced: 03 Nov 2024

https://github.com/iphysresearch/datascicomp

A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.

challenge competition data-challenge data-science data-science-competitions project

Last synced: 25 Sep 2024

https://github.com/jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science ipython ipython-notebook machine-learning mllib notebook pyspark python spark

Last synced: 20 Dec 2024

https://github.com/hadley/stats337

Readings in applied data science

data-science teaching

Last synced: 22 Dec 2024

https://github.com/iamtodor/data-science-interview-questions-and-answers

Data science interview questions with answers. Not ideally (yet)

data-science interview-preparation interview-questions machine-learning

Last synced: 29 Nov 2024

https://github.com/enzoampil/fastquant

fastquant — Backtest and optimize your ML trading strategies with only 3 lines of code!

algotrading backtesting cryptocurrency data-science financial-data-science machine-learning quantitative-finance stocks trading-strategies

Last synced: 19 Dec 2024

https://github.com/nubank/fklearn

fklearn: Functional Machine Learning

data-analysis data-science machine-learning ml python

Last synced: 24 Dec 2024

https://github.com/man-group/arcticdb

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 17 Dec 2024

https://github.com/h2oai/h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform

data-science deep-learning h2o machine-learning python r tutorial

Last synced: 18 Dec 2024

https://github.com/google/uncertainty-baselines

High-quality implementations of standard and SOTA methods on a variety of tasks.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 18 Dec 2024