Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/tidyverse/tidyverse

Easily install and load packages from the tidyverse

data-science r tidyverse

Last synced: 31 Jul 2024

https://github.com/hadley/stats337

Readings in applied data science

data-science teaching

Last synced: 30 Jul 2024

https://github.com/jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science ipython ipython-notebook machine-learning mllib notebook pyspark python spark

Last synced: 07 Aug 2024

https://github.com/supabase/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.

auth authentication authorization community data-science databases django fastapi flask good-first-issue machine-learning postgres postgresql python supabase

Last synced: 30 Jul 2024

https://github.com/nubank/fklearn

fklearn: Functional Machine Learning

data-analysis data-science machine-learning ml python

Last synced: 01 Aug 2024

https://github.com/enzoampil/fastquant

fastquant — Backtest and optimize your ML trading strategies with only 3 lines of code!

algotrading backtesting cryptocurrency data-science financial-data-science machine-learning quantitative-finance stocks trading-strategies

Last synced: 31 Jul 2024

https://github.com/h2oai/h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform

data-science deep-learning h2o machine-learning python r tutorial

Last synced: 30 Jul 2024

https://github.com/CamDavidsonPilon/lifetimes

Lifetime value in Python

data-science python statistics

Last synced: 02 Aug 2024

https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: Подборка ресурсов по машинному обучению

collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university

Last synced: 07 Aug 2024

https://github.com/eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 01 Aug 2024

https://github.com/khuyentran1401/Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 02 Aug 2024

https://github.com/google/uncertainty-baselines

High-quality implementations of standard and SOTA methods on a variety of tasks.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 01 Aug 2024

https://github.com/ebhy/budgetml

Deploy a ML inference service on a budget in less than 10 lines of code.

api data-science deployment fastapi inference machine-learning mlops

Last synced: 01 Aug 2024

https://github.com/modelscope/data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit

Last synced: 01 Aug 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 01 Aug 2024

https://github.com/PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 02 Aug 2024

https://github.com/dagworks-inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering

Last synced: 01 Aug 2024

https://github.com/DAGWorks-Inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering

Last synced: 31 Jul 2024

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 07 Aug 2024

https://github.com/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

data-science machine-learning scikit-learn sklearn

Last synced: 31 Jul 2024

https://github.com/reiinakano/xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

automated-machine-learning data-science ensemble-learning hyperparameter-optimization machine-learning scikit-learn stacked-ensembles

Last synced: 03 Aug 2024

https://github.com/kotartemiy/pygooglenews

If Google News had a Python library

data-science google news python rss

Last synced: 31 Jul 2024

https://github.com/microsoft/responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

data-analysis data-science data-visualization error-analysis explainability explainable-ai explainable-ml fairness fairness-ai fairness-ml interpretability jupyter machine-learning machinelearning ml responsible-ai ui visualization widget widgets

Last synced: 01 Aug 2024

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 31 Jul 2024

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 04 Aug 2024

https://github.com/jrfiedler/causal_inference_python_code

Python code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science python

Last synced: 31 Jul 2024

https://github.com/business-science/free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.

data-science newsletter tips tips-and-tricks

Last synced: 31 Jul 2024

https://github.com/devAmoghS/Machine-Learning-with-Python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 07 Aug 2024

https://github.com/scikit-learn-contrib/MAPIE

A scikit-learn-compatible module for estimating prediction intervals.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 02 Aug 2024

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 30 Jul 2024

https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Unix, R and python tools for genomics and data science

bioinformatics cancer-genomics data-science

Last synced: 02 Aug 2024

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 30 Jul 2024

https://github.com/man-group/arcticdb

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 31 Jul 2024

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 02 Aug 2024

https://github.com/JuliaStats/Distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 03 Aug 2024

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 31 Jul 2024

https://github.com/novak-99/MLPP

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 31 Jul 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 30 Jul 2024

https://github.com/xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost

Last synced: 31 Jul 2024

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 30 Jul 2024

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

artificial-intelligence civic-tech data-science governments-gazettes govtech hacktoberfest hacktoberfest2023 machine-learning open-data politics scraping spider

Last synced: 30 Jul 2024

https://github.com/mrkn/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 31 Jul 2024

https://github.com/squaredtechnologies/vizly-notebook

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 12 Aug 2024

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 31 Jul 2024

https://github.com/squaredtechnologies/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 01 Aug 2024

https://github.com/dssg/hitchhikers-guide

The Hitchhiker's Guide to Data Science for Social Good

data-science dssg machine-learning training tutorial-exercises

Last synced: 02 Aug 2024

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 01 Aug 2024

https://github.com/Mybridge/machine-learning-open-source

Monthly Series - Machine Learning Top 10 Open Source Projects

ai algorithm artificial-intelligence data-science machine-learning neural-network

Last synced: 31 Jul 2024

https://github.com/sematic-ai/sematic

An open-source ML pipeline development platform

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3

Last synced: 03 Aug 2024

https://github.com/maxpumperla/deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"

alphago alphago-zero data-science deep-learning game-of-go games machine-learning neural-networks python

Last synced: 01 Aug 2024

https://github.com/grailbio/reflow

A language and runtime for distributed, incremental data processing in the cloud

analysis-pipeline aws bioinformatics-pipeline cloud-computing data-science golang language runtime scientific-computing

Last synced: 31 Jul 2024

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 08 Aug 2024

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 07 Aug 2024

https://github.com/davidadsp/generative_deep_learning_2nd_edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 02 Aug 2024