Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/launchflow/buildflow

BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.

batch data-science pipeline python streaming

Last synced: 03 Jul 2024

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 03 Jul 2024

https://github.com/MLMI2-CSSI/foundry

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

chemistry data-science datasets machine-learning materials-science

Last synced: 03 Jul 2024

https://github.com/run-house/runhouse

The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.

api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless

Last synced: 03 Jul 2024

https://github.com/kdr-aus/ogma

Scripting language focused on processing tabular data.

data-science language rust scripting-language table-data

Last synced: 03 Jul 2024

https://github.com/hamelsmu/Seq2Seq_Tutorial

Code For Medium Article "How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"

data-science deep-learning deeplearning keras keras-tutorials machine-learning medium-article nlp-machine-learning rnn-encoder-decoder seq2seq-tutorial sequence-to-sequence

Last synced: 02 Jul 2024

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 02 Jul 2024

https://github.com/neonwatty/machine_learning_refined

Notes, examples, and Python demos for the 2nd edition of the textbook "Machine Learning Refined" (published by Cambridge University Press).

artificial-intelligence autograd collab data-science deep-learning jax jupyter-notebook lecture-notes machine-learning machine-learning-algorithms mathematical-optimization neural-network numpy python slides

Last synced: 02 Jul 2024

https://github.com/allenai/allennlp

An open-source NLP research library, built on PyTorch.

data-science deep-learning natural-language-processing nlp python pytorch

Last synced: 02 Jul 2024

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 01 Jul 2024

https://github.com/pablofrommars/fsharp-notebook

Data Science Notebook for F# interactive

data-science data-visualization fsharp vscode-extension

Last synced: 01 Jul 2024

https://github.com/jgoerner/beyond-jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

airflow apache apistar data-science docker docker-compose jupyter jupyter-notebook minio postgres superset

Last synced: 01 Jul 2024

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 30 Jun 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 30 Jun 2024

https://github.com/ak-coram/cl-duckdb

Common Lisp CFFI wrapper around the DuckDB C API

c-bindings common-lisp data-science duckdb lisp olap parquet sql

Last synced: 30 Jun 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 29 Jun 2024

https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Last synced: 29 Jun 2024

https://github.com/ploomber/soopervisor

☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.

airflow argo argo-workflows aws data-science kubeflow kubeflow-pipelines kubernetes machine-learning slurm workflow

Last synced: 29 Jun 2024

https://github.com/aporia-ai/mlnotify

🔔 No need to keep checking your training - just one import line and you'll know the second it's done.

data-science deep-learning deeplearning machine-learning machinelearning machinelearning-python ml notification notifications opensource python python3 tool tools

Last synced: 29 Jun 2024

https://github.com/datacarpentry/semester-biology

Forkable teaching materials for course on working with data in R

biology data-carpentry data-science r spatial-data sql teaching-materials

Last synced: 29 Jun 2024

https://github.com/iterative/mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞

cli data-science deployment developer-tools git machine-learning mlem model-registry python

Last synced: 29 Jun 2024

https://github.com/carloocchiena/the_statistics_handbook

the statistics handbook open source repository

data-science latex mathematics statistics

Last synced: 29 Jun 2024

https://github.com/maximveksler/awesome-serialization

Data formats useful for API, Big Data, ML, Graph & co

awesome-list big-data data-science serialization-formats

Last synced: 29 Jun 2024

https://github.com/dsgiitr/d2l-pytorch

This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.

book computer-vision d2l data-science deep-learning dive-into-deep-learning mxnet nlp pytorch pytorch-implmention

Last synced: 29 Jun 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 29 Jun 2024

https://github.com/datapane/examples

Datapane Examples

data-science datapane jupyter python

Last synced: 28 Jun 2024

https://github.com/SamEdwardes/pydatafaker

A python package to create fake data with relationships between tables.

data data-science fake-data python

Last synced: 27 Jun 2024

https://github.com/SeroviICAI/Movie_Recommender

This program recommends you a movie for any genre, scraping data from IMDb. This project is done for educational purposes.

data-science educational entertainment movies pandas python regex webscraping

Last synced: 27 Jun 2024

https://github.com/zMoooooritz/stapy

An easy to use SensorThings API Client written in Python

api cli data-science database ogc python sensor sensor-data sensorthings sensorthings-api

Last synced: 26 Jun 2024

https://github.com/dssg/hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good

data-science dssg machine-learning training tutorial-exercises

Last synced: 26 Jun 2024

https://microsoft.github.io/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 26 Jun 2024

https://github.com/firmai/business-analytics-and-mathematics-python-book

Advanced Business Analytics and Mathematics with Python (by @firmai)

analytics business data-analysis data-science mathematics python

Last synced: 26 Jun 2024

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 26 Jun 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 26 Jun 2024

https://github.com/mfarragher/obsidiantools

Obsidian tools - a Python package for analysing an Obsidian.md vault

data-science knowledge-management network-analysis note-taking obsidian-community obsidian-md python

Last synced: 26 Jun 2024

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 26 Jun 2024

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 25 Jun 2024

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 24 Jun 2024

https://github.com/EricCacciavillani/eFlow

Semi-Automated machine learning/data science workflow

automation data-science data-visualization machine-learning python3 workflow

Last synced: 24 Jun 2024

https://github.com/yash1994/auto-awesome-list

:zap: An automated list of Machine Learning and Data Science tools from research organizations

artificial-intelligence big-data data-science machine-learning

Last synced: 24 Jun 2024

https://github.com/mstaddon/GraniteAI

Automated machine learning and data mining software

automated-machine-learning data-science machine-learning user-interface

Last synced: 24 Jun 2024

https://github.com/magnusax/AutoML

The project aims to develop a customized ML framework on top of existing libraries

data-science machine-learning machine-learning-algorithms machine-learning-library python scikit-learn

Last synced: 24 Jun 2024

https://github.com/HDI-Project/ATM

Auto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).

automl data-science distributed-computing hyperparameter-optimization machine-learning

Last synced: 24 Jun 2024

https://github.com/AlexIoannides/ml-workflow-automation

Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.

classification data-science flask helm jupyter-notebook kaggle kubernetes machine-learning mlops numpy pandas python rest-api sklearn

Last synced: 24 Jun 2024

https://github.com/SaltWaterStudio/modgen

Project designed to automatically pick parameters (AutoML) for multiple models for rapid feature engineering.

automation data-science machine-learning statistical-models

Last synced: 24 Jun 2024

https://github.com/societe-generale/aikit

Automated machine learning package

automl data-science machine-learning python

Last synced: 24 Jun 2024

https://github.com/supabase-community/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.

auth authentication authorization community data-science databases django fastapi flask good-first-issue machine-learning postgres postgresql python supabase

Last synced: 24 Jun 2024

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 23 Jun 2024

https://github.com/PPshrimpGo/BDCI2018-ChinauUicom-1st-solution

这是BDCI2018的联通赛题第一名解决方案

competition data-science

Last synced: 23 Jun 2024

https://github.com/aikho/awesome-feature-engineering

A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning

ai data-science feature-engineering feature-extraction machine-learning

Last synced: 22 Jun 2024

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 22 Jun 2024

https://github.com/mukeshmithrakumar/Book_List

Python, Machine Learning, Deep Learning and Data Science Books

algorithms books data-science deep-learning free machine-learning python

Last synced: 22 Jun 2024

https://github.com/nickslevine/zebras

Data analysis library for JavaScript built with Ramda

data-analysis data-science functional-programming javascript pandas ramda

Last synced: 22 Jun 2024