Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/h2oai/h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform

data-science deep-learning h2o machine-learning python r tutorial

Last synced: 10 Oct 2024

https://github.com/iamtodor/data-science-interview-questions-and-answers

Data science interview questions with answers. Not ideally (yet)

data-science interview-preparation interview-questions machine-learning

Last synced: 14 Oct 2024

https://github.com/CamDavidsonPilon/lifetimes

Lifetime value in Python

data-science python statistics

Last synced: 02 Aug 2024

https://github.com/google/uncertainty-baselines

High-quality implementations of standard and SOTA methods on a variety of tasks.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 15 Oct 2024

https://github.com/denizyuret/knet.jl

Koç University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 15 Oct 2024

https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: Подборка ресурсов по машинному обучению

collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university

Last synced: 14 Oct 2024

https://github.com/eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 08 Nov 2024

https://github.com/ebay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 15 Oct 2024

https://github.com/khuyentran1401/efficient_python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 15 Oct 2024

https://github.com/khuyentran1401/Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 02 Aug 2024

https://github.com/ropensci/drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

data-science drake high-performance-computing makefile peer-reviewed pipeline r r-package reproducibility reproducible-research ropensci rstats workflow

Last synced: 10 Oct 2024

https://github.com/ebhy/budgetml

Deploy a ML inference service on a budget in less than 10 lines of code.

api data-science deployment fastapi inference machine-learning mlops

Last synced: 11 Oct 2024

https://github.com/patmartin/dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 29 Oct 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 09 Nov 2024

https://github.com/PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 02 Aug 2024

https://github.com/dagworks-inc/hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering featurization hacktoberfest lineage llmops machine-learning mlops numpy orchestration pandas python software-engineering

Last synced: 11 Oct 2024

https://github.com/googlecloudplatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 07 Oct 2024

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 07 Aug 2024

https://github.com/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

data-science machine-learning scikit-learn sklearn

Last synced: 13 Oct 2024

https://github.com/kotartemiy/pygooglenews

If Google News had a Python library

data-science google news python rss

Last synced: 14 Oct 2024

https://github.com/reiinakano/xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

automated-machine-learning data-science ensemble-learning hyperparameter-optimization machine-learning scikit-learn stacked-ensembles

Last synced: 13 Oct 2024

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 29 Oct 2024

https://github.com/ikatsov/tensor-house

A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.

ai customer-analysis data-science deep-learning llm machine-learning marketing models personalization reinforcement-learning supply-chain

Last synced: 14 Oct 2024

https://github.com/microsoft/responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

data-analysis data-science data-visualization error-analysis explainability explainable-ai explainable-ml fairness fairness-ai fairness-ml interpretability jupyter machine-learning machinelearning ml responsible-ai ui visualization widget widgets

Last synced: 11 Oct 2024

https://github.com/jrfiedler/causal_inference_python_code

Python code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science python

Last synced: 29 Oct 2024

https://github.com/alan-turing-institute/clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 22 Oct 2024

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 04 Aug 2024

https://github.com/mandiant/threatpursuit-vm

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 14 Oct 2024

https://github.com/business-science/free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.

data-science newsletter tips tips-and-tricks

Last synced: 15 Oct 2024

https://github.com/devamoghs/machine-learning-with-python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 10 Oct 2024

https://github.com/devAmoghS/Machine-Learning-with-Python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 07 Aug 2024

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 14 Oct 2024

https://github.com/scikit-learn-contrib/mapie

A scikit-learn-compatible module for estimating prediction intervals.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 10 Oct 2024

https://github.com/scikit-learn-contrib/MAPIE

A scikit-learn-compatible module for estimating prediction intervals.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 02 Aug 2024

https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Unix, R and python tools for genomics and data science

bioinformatics cancer-genomics data-science

Last synced: 15 Oct 2024

https://github.com/xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost

Last synced: 11 Oct 2024

https://github.com/man-group/arcticdb

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 15 Oct 2024

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 31 Oct 2024

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 24 Oct 2024

https://github.com/shujian2015/freeml

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 15 Oct 2024

https://github.com/juliastats/distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 15 Oct 2024

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 02 Aug 2024

https://github.com/sajal2692/data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

data-science keras machine-learning nlp pandas portfolio python scikit-learn

Last synced: 10 Oct 2024

https://github.com/datumbox/datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

big-data data-science java machine-learning nlp statistics

Last synced: 15 Oct 2024

https://github.com/novak-99/mlpp

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 30 Oct 2024

https://github.com/novak-99/MLPP

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 27 Oct 2024

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

civic-tech data-science governments-gazettes govtech hacktoberfest open-data politics scraping spider

Last synced: 14 Oct 2024

https://github.com/JuliaStats/Distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 03 Aug 2024

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 14 Oct 2024

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 29 Oct 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 29 Oct 2024

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 12 Oct 2024

https://github.com/squaredtechnologies/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 09 Nov 2024

https://github.com/mrkn/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 09 Oct 2024

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 15 Oct 2024