Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/CamDavidsonPilon/lifetimes

Lifetime value in Python

data-science python statistics

Last synced: 13 Nov 2024

https://github.com/denizyuret/knet.jl

Koç University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 20 Dec 2024

https://github.com/denizyuret/Knet.jl

Koç University deep learning framework.

data-science deep-learning julia knet machine-learning neural-networks

Last synced: 13 Nov 2024

https://github.com/khuyentran1401/efficient_python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 19 Dec 2024

https://github.com/khuyentran1401/Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists

data-science python python3

Last synced: 11 Nov 2024

https://github.com/demidovakatya/vvedenie-mashinnoe-obuchenie

:memo: Подборка ресурсов по машинному обучению

collections data-mining data-science deep-learning machine-learning mooc neural-networks nlp russian university

Last synced: 30 Nov 2024

https://github.com/ebay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 03 Dec 2024

https://github.com/eBay/tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.

cli command-line csv d data-mining data-science delimited-files dlang reservoir-sampling sampling shuffle statistics tabular-data tsv uniq

Last synced: 08 Nov 2024

https://github.com/microsoft/responsible-ai-toolbox

Responsible AI Toolbox is a suite of tools providing model and data exploration and assessment user interfaces and libraries that enable a better understanding of AI systems. These interfaces and libraries empower developers and stakeholders of AI systems to develop and monitor AI more responsibly, and take better data-driven actions.

data-analysis data-science data-visualization error-analysis explainability explainable-ai explainable-ml fairness fairness-ai fairness-ml interpretability jupyter machine-learning machinelearning ml responsible-ai ui visualization widget widgets

Last synced: 17 Dec 2024

https://github.com/ropensci/drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

data-science drake high-performance-computing makefile peer-reviewed pipeline r r-package reproducibility reproducible-research ropensci rstats workflow

Last synced: 17 Dec 2024

https://github.com/ebhy/budgetml

Deploy a ML inference service on a budget in less than 10 lines of code.

api data-science deployment fastapi inference machine-learning mlops

Last synced: 20 Dec 2024

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 17 Dec 2024

https://github.com/tomasonjo/blogs

Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/

data-science graph graph-algorithms neo4j

Last synced: 19 Dec 2024

https://github.com/PatMartin/Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 13 Nov 2024

https://github.com/googlecloudplatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 21 Dec 2024

https://github.com/patmartin/dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

d3 d3js data-analysis data-mining data-science data-visualization datavis datavisualization dataviz groovy java javafx visualization

Last synced: 23 Dec 2024

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

cloud-computing data-analysis data-engineering data-pipeline data-processing data-science data-visualization machine-learning

Last synced: 27 Nov 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 09 Nov 2024

https://github.com/kotartemiy/pygooglenews

If Google News had a Python library

data-science google news python rss

Last synced: 22 Dec 2024

https://github.com/nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

data-science machine-learning scikit-learn sklearn

Last synced: 18 Dec 2024

https://github.com/scikit-learn-contrib/MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 12 Nov 2024

https://github.com/scikit-learn-contrib/mapie

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.

classification confidence-intervals conformal-prediction data-science python regression sklearn

Last synced: 17 Dec 2024

https://github.com/ikatsov/tensor-house

A collection of reference Jupyter notebooks and demo AI/ML applications for enterprise use cases: marketing, pricing, supply chain, smart manufacturing, and more.

ai customer-analysis data-science deep-learning llm machine-learning marketing models personalization reinforcement-learning supply-chain

Last synced: 21 Dec 2024

https://github.com/jrfiedler/causal_inference_python_code

Python code for part 2 of the book Causal Inference: What If, by Miguel Hernán and James Robins

causal-inference causality data-science python

Last synced: 22 Dec 2024

https://github.com/reiinakano/xcessiv

A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

automated-machine-learning data-science ensemble-learning hyperparameter-optimization machine-learning scikit-learn stacked-ensembles

Last synced: 21 Dec 2024

https://github.com/alan-turing-institute/CleverCSV

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 29 Oct 2024

https://github.com/alan-turing-institute/clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

csv csv-converter csv-export csv-files csv-format csv-import csv-parser csv-parsing csv-reader csv-reading data-analysis data-mining data-science datascience machine-learning python python-library python3

Last synced: 17 Dec 2024

https://github.com/microsoft/rd-agent

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.

agent ai automation data-mining data-science development llm research

Last synced: 18 Dec 2024

https://github.com/mandiant/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 21 Nov 2024

https://github.com/fireeye/ThreatPursuit-VM

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 05 Dec 2024

https://github.com/mandiant/threatpursuit-vm

Threat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.

analytics cyber data-science fireeye intelligence intelligence-analysis malware mandiant threat threathunting threatintelligence virtual-machine

Last synced: 14 Oct 2024

https://github.com/rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

data-science dataframe dataframes go golang machine-learning pandas pandas-dataframe python statistics

Last synced: 21 Dec 2024

https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources

Unix, R and python tools for genomics and data science

bioinformatics cancer-genomics data-science

Last synced: 19 Dec 2024

https://github.com/business-science/free_r_tips

Free R-Tips is a FREE Newsletter provided by Business Science. It comes with bite-sized code tutorials every week.

data-science newsletter tips tips-and-tricks

Last synced: 02 Dec 2024

https://github.com/devamoghs/machine-learning-with-python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 19 Dec 2024

https://github.com/devAmoghS/Machine-Learning-with-Python

Small scale machine learning projects to understand the core concepts . Give a Star 🌟If it helps you. BONUS: Interview Bank coming up..!

beginner-friendly data-science deep-learning exercises machine-learning practice-project python python-3 scikit-learn

Last synced: 27 Nov 2024

https://github.com/davidadsp/generative_deep_learning_2nd_edition

The official code repository for the second edition of the O'Reilly book Generative Deep Learning: Teaching Machines to Paint, Write, Compose and Play.

chatgpt dalle2 data-science deep-learning diffusion-models generative-adversarial-network gpt-3 machine-learning python stable-diffusion tensorflow

Last synced: 20 Dec 2024

https://github.com/xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

data-science distributed-systems lightgbm machine-learning ml numpy pandas python scalable xgboost

Last synced: 18 Dec 2024

https://github.com/juliastats/distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 17 Dec 2024

https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

civic-tech data-science digital-public-goods dpg governments-gazettes govtech hacktoberfest open-data politics scraping sdg-16 spider

Last synced: 19 Dec 2024

https://github.com/JuliaStats/Distributions.jl

A Julia package for probability distributions and associated functions.

data-science julia probability-distributions statistics

Last synced: 15 Nov 2024

https://github.com/shujian2015/freeml

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 02 Dec 2024

https://github.com/elixir-explorer/explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

data-science dataframes elixir rust

Last synced: 20 Dec 2024

https://github.com/man-group/ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

big-data data data-analysis data-science database dataframe pandas quantitative-analysis quantitative-finance quantitative-trading

Last synced: 24 Oct 2024

https://github.com/Shujian2015/FreeML

A List of Data Science/Machine Learning Resources (Mostly Free)

data-science deep-learning machine-learning natural-language-processing

Last synced: 13 Nov 2024

https://github.com/sajal2692/data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

data-science keras machine-learning nlp pandas portfolio python scikit-learn

Last synced: 22 Dec 2024

https://github.com/teomewhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 20 Dec 2024

https://github.com/novak-99/mlpp

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 18 Dec 2024

https://github.com/datumbox/datumbox-framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

big-data data-science java machine-learning nlp statistics

Last synced: 21 Dec 2024

https://github.com/novak-99/MLPP

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

cpp data-science deep-learning machine-learning

Last synced: 27 Oct 2024

https://github.com/TeoMeWhy/teomerefs

Guia de referências técnicas para carreira em dados

data data-science machine-learning python

Last synced: 29 Oct 2024

https://github.com/makcedward/nlp

:memo: This repository recorded my NLP journey.

ai data-science deep-learning machine-learning nlp

Last synced: 23 Dec 2024

https://github.com/alishobeiri/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 20 Dec 2024

https://github.com/squaredtechnologies/thread

AI-powered Jupyter Notebook — use local AI to generate and edit code cells, automatically fix errors, and chat with your data

ai analysis analytics data-science jupyter jupyter-notebook jupyter-notebooks jupyterhub jupyterlab ollama python react reactjs

Last synced: 06 Dec 2024

https://github.com/mrkn/pycall.rb

Calling Python functions from the Ruby language

data-science pycall python ruby rubydatascience rubyml

Last synced: 20 Dec 2024

https://github.com/rhiever/datacleaner

A Python tool that automatically cleans data sets and readies them for analysis.

automation data-science machine-learning python

Last synced: 21 Dec 2024

https://github.com/starpig1129/ai-data-analysis-multiagent

AI-Driven Research Assistant: An advanced multi-agent system for automating complex research processes. Leveraging LangChain, OpenAI GPT, and LangGraph, this tool streamlines hypothesis generation, data analysis, visualization, and report writing. Perfect for researchers and data scientists seeking to enhance their workflow and productivity.

agent ai ai-data-analysis artificial-intelligence code-generation data-analysis data-analytics data-science langchain langgraph large-language-model large-language-models llm multiagent-systems python

Last synced: 22 Dec 2024